General tab - Properties of Classification Locator window
The
Classification Locator uses the classification scheme defined in a secondary external
Tungsten Transformation project. This project provides additional
classification results for a document in the form of locator alternatives. Only the classification scheme is used from the external
project.
Use the
Classification Locator to add information to a document that is obtained by using additional
classification steps. Since the additional classification steps are normally independent of the main classification in the main
project, an externally defined and trained project is used. The
Classification Locator gives access to multi-view classification that sees the document from different
aspects and multi-topic classification that returns more than one classification result for a document or a line of text.
The General tab enables you to select a classification project, configure the classification mode and specify minimum confidence. The following settings are available:
- Referenced project file
-
Browse to the location where the reference project is located and select the desired project. A copy of that project is made and stored within the data files of the current project. No images from the original project are copied to the current project because they are not required for classification purposes.
This project copy is referenced and updated whenever a document is processed by the corresponding Classification Locator.
- Automatic update from project file
-
Select this setting to ensure that the local copy of the reference project stays up-to-date with the original reference project file. For the best results, select this setting only if the referenced project is updated regularly. (Default: Cleared)
If this setting is selected, the Tungsten Transformation Server accesses the original referenced project and if it is newer, the copy of the project is updated. This means that both the original project and its copy must be accessible during production.
- Classification Mode
-
This group enables you to select one of the following classification modes:
-
Complete document (text only). (Default: Selected)
This value means that the entire document and its text is used for classification.
This mode does not consider any hierarchical classification rules, such as subtree classification or default classification results. .
The text used here for classification can be restricted to specific regions or pages.
-
Line by line (text only).
This mode means that each text line is classified individually and returned as an alternative, if the confidence is high enough. The results are then sorted by confidence.
The coordinates of the line are included with the returned alternatives and each alternative is highlighted on the document. This enables the calling project to access these coordinates as needed. For example, to find the highest line on a page that was classified as a specific value.
-
By Paragraph (text only).
This mode uses paragraphs from a document for classification, honoring chapter or section numbers as well as numbered or bullet lists.
-
Complete document (hierarchical).
This mode means that both layout and text classification can be used in the referenced classification project. For the actual classification process, the various settings in the classification project is used.
The regions definition is used to determine how many pages need OCR.
A final classification result can have a very low confidence if certain classification rules were applied. The result can also be lower than the results of other classes that are not the final classification result. The Set classification result to 100% setting should be used in that case.
If you choose this value, you cannot define a default result for the locator and therefore the default result for the Result Mode pane is disabled. In case no result is found for this locator, the default classification result that is defined within the referenced project file is assigned as the locator result.
The above classification modes do not execute scripts, even classification scripts.
-
- Classification Settings
-
This group has the following settings:
- Min. confidence
-
Only classification results with a confidence higher than or equal to this value is returned as alternatives. (Default: 70)
- Set classification result to 100%
-
This setting is only enabled when the Complete document (hierarchical) value is selected for the Classification Mode setting. When this setting is selected, the confidence of the alternative that is the final classification result always is 100%. This is important because the final classification result might be very low as a result of subtree classification or using the default classification result. If that were to happen, it would not be possible to distinguish between the final classification result and other possible alternatives. (Default: Selected)
- Min. words in a paragraph
-
When the Classification Mode is set to By Paragraph (text only), this setting is available so that you can configure the minimum size of a paragraph. This means that paragraphs with fewer words than what is specified here are not classified. (Default: 20)
- Result Mode
-
This group enables you to select one of the following result modes as well as configure the following values:
-
Single topic. (Default: Selected)
If this value is selected, only one class from the referenced project is used as a result in the alternative.
-
Multi topic. If this value is selected, a semicolon delimited list of the best class results is used as alternative values.
- Max. number of results (0 = all)
-
This limits the number of returned classification results to the specified number. A value of 0 means all alternatives that meet the confidence requirements is returned. (Default: 5)
- Default result
-
This setting is disabled unless the Complete document (text only) value is selected for the Classification Mode setting. If no classification result is found, this default result is assigned as the final result. If no default result is defined, the locator returns no value.(Default: None)
The default result can be a text string such as "Nothing", "Unclassified", or something similar.
-
The following buttons are available at the bottom of this window:
Button |
Description |
---|---|
Close |
Closes the window and saves your changes. |
![]() |
Tests the locator settings. The results are displayed on the Test Results tab that is displayed automatically when you click this button. Depending on the locator method, this button may have additional modes if the locator uses other locators as input. |
![]() |
Displays the help for the open window. |
Related topics:
-
How to configure a Classification Locator