Classification benchmarks and optimization with the Result Matrix
To provide a complete and detailed analysis of classification quality, the Project Builder enables you to test the classification against a benchmark document set or the Classification Training document set. A benchmark provides more information than a preliminary classification test.
A benchmark document set is a set of documents that differs from the training documents and a set of test documents. Each document in a benchmark set has an assigned class. When you generate a benchmark, the classifier compares the assigned class to the classification result based on project-level and class-level classification settings.
You can generate a classification benchmark for all classification settings in a project. This includes all classification settings on the Project Settings - Classification tab, class details, all classifiers, any classification instructions, and any classification script events.
You can also generate a classification benchmark for a specific classifier configured in your project. This means that if you have a layout classifier, a top-level content classifier (Adaptive Feature Classifier), and subtree classifiers, you can generate a benchmark for each. This enables you to compare classifier results and modify classification settings as needed.
You cannot run a benchmark against a normal custom test set because there is no "Assigned Class" value in a test set, and this value is needed for comparison against the classification result. If necessary, convert a custom test set into a benchmark set to generate a benchmark.
When you want to generate a classification or separation benchmark, first store the assigned class structure on disk using the Sort Documents on Disk by Class setting from the Benchmark Sets shortcut menu.
Running a benchmark generates the Result Matrix and displays a grid of the correctly and incorrectly classified documents by class. The Result Matrix provides statistics about unclassified documents and about recall and precision.
For the best results for a project that includes child classes or a complex class hierarchy, select "Classification Benchmark (All Classification Settings)" when generating a classification benchmark for a benchmark set.
Any results that are overridden by exceptions appear in a darker color than the rest of the results. For example, if an added exception is sets a group of unclassified documents to valid, the color becomes dark green.
In order to optimize the classification quality with the Result Matrix, you can do the following:
-
Generate a classification benchmark for all project classification settings
-
Generate a classification benchmark for a specific classifier
-
Save classification benchmark results
-
Compare classification benchmark results
-
Add classification benchmark exceptions
Related topics: