Image Classifier tab
The Image Classifier tab contains the following options:
- Optimize classifier for
-
The classifier can be used in an optimized way for either invoices (accounts payables) or forms. Depending on the selected options, different features are used to classify an image. Use Invoices for all semi-structured documents that also contain variable parts, like invoices, loading bills, orders etc. Use Forms for structured documents that have a pre-printed form background.
This option is set to "Invoices" by default.
- Auto rotation 180°
-
Select this option if you want to automatically rotate an image by 180° during classification. Rotation is only done for display. The image file on disk is left unchanged.
Note All images are upside down if the first image of a cluster was learned this way.This option is cleared by default.
- Enable skew tolerance
-
Select this option to allow skew tolerance during classification. For the best results, deskew your images during scanning.
This option is selected by default.
- Max. samples for internal classes
-
The Image Classifier supports an unlimited number of samples per class. If the sample images are very different, the layout classifier internally learns different patterns for each sample and therefore, it is able to assign them to one class. For performance reasons you might want to limit the number of sample documents which are actually used for feature extraction. A value of 0 means no limitation. Type the Max. samples for internal classes into the box.
This option is set to 0 by default.
- Threshold for classification accuracy
-
Influence the classification accuracy for low contrast images by using the Threshold for classification accuracy slider or entering a number between 0 and 40. The value determines how low contrast (gray) regions contribute to the similarity of an image with a learned class. A higher value gives higher precision. Then the probability to get misclassified documents from blurred images is smaller resulting in fewer errors but also in more unclassified images. If you enter a lower value, high confidence values could also be returned for bad-contrast documents. It then might occur that high confidence values are determined also for other document classes, which have a blurred structure in the same region of the document. This might lead to a higher error rate.
The default value is 10, which is determined to be an optimal value for normal documents.