Class Details

The class details are displayed in the Details window whenever you select a class Project Tree - Class icon in the Project Tree.

At the top right of the Details window, a button is available to open the Script Code window. This Show Script icon indicates by displaying a different number of lines whether there already exists some script code.

The class details offer are a number of options that you can use to configure the class.

Classification

This group has the following options:

Train this class for layout classification

If selected, any documents belonging to this class in the Classification Training Set are used to train the project for layout classification. This can be done manually during configuration, or automatically during production. This ensures that documents of this class are easily recognized based on their layout. This option is selected by default.

This is available only for projects created using the Classification Group template.

Train this class for content classification

If selected, any documents belonging to this class in the Classification Training Set are used to train the project for content classification. This can be done manually during configuration, or automatically during production. This ensures that documents of this class are easily recognized based on their content. This option is selected by default.

Monitor this class for Classification Online Learning
Note This option is relevant for Classification Groups only. Even if this option is enabled for Extraction Groups, it's value has no effect on the project.

This option is enabled only when either the Train this class for layout classification option or the Train this class for content classification option is selected here, and the Use Classification Online Learning option is selected on the Advanced Online Learning Options window.

If this option is selected, training documents are used to improve classification results over time. Training documents can be added manually during project configuration and they can be collected automatically during production. This option is cleared by default.

Valid classification result

If this option is selected, the class can be used as the result of the classification step. If this option is not selected, documents cannot be assigned to this class by the classification process. This option is selected by default.

Prohibiting the class from becoming the classification result might be useful for classes that are inserted as base classes for the sole purpose of defining common fields and common extraction methods.

If a class meets the classification criteria but is prohibited from becoming the classification result, its parent, if there is one, is used as the classification result. If there is no parent, the document is not be classified.

Available for manual classification

If Available for manual classification is selected, the class name is included in the class list. Otherwise, the class name is excluded from the list and the user cannot assign it as the classification result. This option is selected by default.

Note In the case when a document is classified as a non-visible class, then this class is included in the list of classes for this document.
Redirect to class

This option makes it possible to replace the classification result. If set, reclassification is done exactly once for each document, and cannot be chained, even if several redirections are defined. The value for this option is set to <No redirection> by default.

If a document is placed in this class as a final result, and a redirection option is specified, then the specified class becomes the final result with the same confidence as the original result for the original class.

This option is useful if there are a number of different forms that all belong to one logical class. For example, a change of address form. Continuing with this example, there could be a separate subclass for each document type, such as for multilingual documents. If there is no need to perform any special actions with these forms, they can be redirected to the logical class for address changes.

If you use redirection along with class-level classification thresholds that are configured on the Advanced tab and a document is redirected to another class, the configured settings for redirected class are used to determine whether the classification result is confident or unconfident.

Subtree Classification

This group has the following options:

Enable subtree classification

If this option is checked, and this class is a valid classification result, then a second classification step is started for the complete child class tree using the confidence and distance values defined for the subtree classification. Furthermore, hierarchical rules, such as Single child wins over parent is applied. This additional step is called subtree classification. This option is cleared by default.

For the purposes of subtree classification, you can set different confidence and distance values. These different values make it possible to get more highly differentiated classification results than would be possible with a single classification step.

Typically, for the first classification step you would use either adaptive feature classification or layout classification. Instruction classification is normally the best choice for subtree classification.

By using subtree classification, you can also combine layout and content classification. This requires classifying a document with the Layout Classifier and activating subtree classification for the class. For the evaluation inside the subtree, only the results from content classification are used. This can help to distinguish between forms that are very similar in layout and therefore needs to be distinguished based on textual content.

Minimum confidence

This option is only available if the Enable subtree classification option is selected. Specify the minimum confidence required to run subtree classification. The value for this option is set to 40% by default.

The instructions used for subtree classification should have a lower confidence than the global classification threshold, so that they do not influence the first classification step.

Minimum distance

This option is only available if the Enable subtree classification option is selected. Specify the minimum distance to make it possible to find a result inside the subtree based on the defined instructions. The value for this option is set to 10% by default.

This distance setting should be lower than the global distance.

Subtree classification via parent class required

Select a class from this list if you want the child class to have a valid classification only if the parent class is classified using subtree classification. The value for this option is set to <No restriction> by default.

If you use subtree classification along with class-level classification thresholds and a document is assigned to a child class, the configured settings are used to determine whether the classification result is confident or unconfident.

Subtree Classifier

If subtree classification is enabled, the Subtree Classifier is created automatically when the project is trained. If both of the following buttons are disabled, there is no Subtree Classifier for this class.

  • Properties. Click this button to edit the Subtree Classifier AFC properties.

  • Reset. Click this to button to reset the Subtree Classifier properties back to their default values.

Trainable Document Separation

This group has the following options:

Minimum page count

Enter a value to indicate the minimum number of pages expected for your documents. The value for this option is set to 1 by default.

Maximum page count

Enter a value to indicate the maximum number of pages expected for your documents. The value for this option is set to 0 by default. This means that the document length is unlimited.

Standard Document Separation

The content may contain single page or multi-page documents, or a combination of both, or loose pages. Document separation processes multi-page documents to split them to separate documents according to the settings, if necessary.

If document separation is activated then all loose pages are added to one multi-page document that is processed by document separation. In a first step document separation is executed. All multi-page documents of are processed and then each multi-page document itself is processed sequentially, page-by-page. After document separation is complete, the newly created documents are classified.

For the separation of a multi-page document, each single page is classified and either a new document is created for each page or the page is added to the current document depending on the separation settings for the classification result. Then the next page is classified and added to the current document or added to a new document until the complete multi-page document is processed.

When document separation is not activated then for each loose page a single page document is created.

You generally activate document separation at the project level.

Note These options are disabled unless document separation is enabled for the project.

This group has the following options:

Ignore for separation

When document separation is enabled for a project, you may disable document separation for single classes by selecting Ignore for separation. If the option is not selected, documents in this class are separated, and several additional options become available. This option is cleared by default.

Class represents

If the "First page" option is selected, a fixed page length can be set. By default, the value for the fixed page length is 0 (zero). This means that the number of pages is unlimited.

For example, when the fixed page length is set to 3 and document separation is processed for a multi-page document, then the following occurs during processing.

Document separation processes the multi-page document page by page. For each page classification is performed and in case a page is classified to this class, then a new document is created and the page is added to this new document. As the fixed document length is set to three the following two pages are added to the document without classifying them and regardless if they would belong to another class. After the third page is added, the current document is closed; it contains three pages now. The next page of the multi-page document is processed until all pages of the multi-page document are processed.

If the value is set to zero and a page of a processed multi-page document is classified to this class, then a new document is created and the page is added. The next page of the multi-page document is added to the current document when:

  • The page is unclassified

  • The page is classified as a Middle page or Last page of the same class as its Corresponding first page

When a processed page of the multi-page document is classified to another class that is not a middle or last page of the current document, then the current document is closed and the current page is added to a new document. After all pages of the multi-page document are processed the next multi-page document is processed.

If "Middle page" or "Last page" is selected, then the list for Corresponding first page is enabled, allowing a class for the middle or last page to be specified. If this is done, then a middle page (or last page) is added to the currently processed document, when the first page of the current document is classified to the class that is selected for the Corresponding first page option. Otherwise, the document is closed and the middle (or last) page is added to a separate new document.

Important If you define a middle or last page for a first page then the option Fixed page length for the first page needs to be set to 0 (unlimited) as this option has priority over other settings. If Fixed page length is set to 1 or higher then the settings for middle or last page are never taken into account as for a fixed page length the pages are added without classifying them.

If <none> is chosen, then the middle page is always added to the current document. For a last page, it works the same way except that the document is closed after the page is added and a new document is started for the next processed page of the multi-page document.

Note If you define a middle or last page on a document where a fixed page length is defined, these settings are not taken into account as the option Fixed page length has priority over the other settings.
Fixed page length

This option is only available if you select "First Page" for the "Class represents" option. If all documents of this type are a fixed page length, you can specify that page length here. This ensures that the documents are separated correctly, and makes it easier to determine if a page contains the content for extraction. The value for this option is set to 0 by default. This means that there is no fixed page length, and that documents can have various page lengths.

Corresponding first page

This option is only enabled when "Class represents" is set to Middle page or Last page. Select a class from the list so a document that is classified as the selected class can have the middle or last page added to it rather than creating a new document.

Recognition

This group has the following option:

Recognition profile

Select a recognition profile from the list. By The value for this option is set to use the default recognition profile by default.

Click Configure Recognition Profile to modify the selected profile. Alternatively, click Recognition Profiles to add or edit an existing recognition profile.

Classification Thresholds

This group has the following options:

Use project level classification thresholds

Clear this option to use classification thresholds for confident classification result at class-level. By default, this option is selected, and the classification thresholds defined on the project level Project Settings - Classification tab is used. If you clear the option, the following options are available if the corresponding classifiers are defined in the project settings.

Layout Classification

A classification result is confident if the confidence of the layout classification’s best alternative is higher than this value. If the confidence is equal to or lower than this value, the classification result is unconfident. This option is available only if layout classification is selected on the Project Settings - Classification tab.

Content Classification

A classification result is confident, if the confidence of the content classification’s best alternative is higher than this value. If the confidence is equal to or lower than this value, the classification result is unconfident. This option is available only if content classification is selected on the Project Settings - Classification tab.

Trainable Document Separation (TDS)

A classification result is confident if the best alternative is higher than this value. If the confidence is equal to or lower than this value, the classification result is unconfident. This option is available only if Trainable Document Separation (TDS) is selected on the Project Settings - Document Separation tab and a TDS-model file is available.

Reporting

This group has the following option:

Other

This group has the following option:

Description
Provide a description for the class. This description appears in the list of classes when the Project Class is selected.