Classification and the order of processing

During classification, the classifiers are processed in a specific order. This order depends on which classifiers are enabled and whether there are training documents present. Training documents are used for layout and content classification only.

If all classifiers are enabled and training documents are present, the sequence is as follows:

Layout classification
Content classification
Generative AI classification

If one of these classifiers is disabled or no training documents are present, that step is skipped. Classification stops once it has a confident classification result. If that occurs after layout classification, then neither of the subsequent classifiers are executed. If all classifiers fail, then the document is marked as invalid and processed accordingly.

Initially, layout classification is run. Because the Layout Classifier does not rely on recognition results to perform classification, any classes that use layout classification can be run without recognition results.

If layout classification is unsuccessful, or a class does not use layout classification or have any layout training documents, recognition is performed using content classification. This includes the top-level content classifier, subtree classifiers if enabled, and instruction classifiers if configured.

Next, if generative AI classification is enabled, it is processed if both layout and content classification fail.

Child classes

For any projects with child classes, hierarchical rules are evaluated for all documents to determine the need for additional classification refinements. These rules are applied automatically and cannot be disabled. This is because these rules ensure the parent-to-child relationship does not hinder the classification result. The final classification result depends on both the hierarchical rules and the classification configuration.

For example, when the Single child wins over its parent hierarchical rule is run, and a parent class and related child class have the same confidence after layout and content classification is performed, the classification result for the document is the child class. The hierarchical rule overrides the parent class in favor of the child class.

Similarly, when the Parent represents competing children rule is run, the classification result is the parent class when related child classes have the same confidence.

More info