Classification benchmarks and optimization with the Result Matrix

To provide a complete and detailed analysis of classification quality, the Project Builder enables you to test the classification against a benchmark document set or the Classification Training document set. A benchmark provides more information than a preliminary classification test.

A benchmark document set is a set of documents that differs from the training documents and a set of test documents. Each document in a benchmark set has an assigned class. When you generate a benchmark, the classifier compares the assigned class to the classification result based on project-level and class-level classification settings.

You can generate a classification benchmark for all classification settings in a project. This includes all classification settings on the Project Settings - Classification tab, class details, all classifiers, any classification instructions, and any classification script events.

You can also generate a classification benchmark for a specific classifier configured in your project. This means that if you have a layout classifier, a top-level content classifier (Adaptive Feature Classifier), and subtree classifiers, you can generate a benchmark for each. This enables you to compare classifier results and modify classification settings as needed.

You cannot run a benchmark against a normal custom test set because there is no "Assigned Class" value in a test set, and this value is needed for comparison against the classification result. If necessary, convert a custom test set into a benchmark set to generate a benchmark.

When you want to generate a classification or separation benchmark, first store the assigned class structure on disk using the Sort Documents on Disk by Class setting from the Benchmark Sets shortcut menu.

Running a benchmark generates the Result Matrix and displays a grid of the correctly and incorrectly classified documents by class. The Result Matrix provides statistics about unclassified documents and about recall and precision.

For the best results for a project that includes child classes or a complex class hierarchy, select "Classification Benchmark (All Classification Settings)" when generating a classification benchmark for a benchmark set.

Any results that are overridden by exceptions appear in a darker color than the rest of the results. For example, if an added exception is sets a group of unclassified documents to valid, the color becomes dark green.

In order to optimize the classification quality with the Result Matrix, you can do the following:

Generate a classification benchmark for all project classification settings
Generate a classification benchmark for a specific classifier
Save classification benchmark results
Compare classification benchmark results
Add classification benchmark exceptions

Classification benchmarks and optimization with the Result Matrix

Search resultsSearch tips

Search tips