Manual versus automatic classification techniques
Without configuring classification in the Transformation Designer, a person needs to manually classify documents before document processing continues.
Manual classification typically follows a hierarchical scheme. First, the main category of a document is determined. The classification is further refined over several steps until the final document category is determined. Tungsten TotalAgility enables you to replicate your manual classification hierarchy scheme so automatic classification achieves the same results.
To configure automatic classification, a class hierarchy is needed. This hierarchy is created and maintained in the Project Tree. Each class added to the project tree can represent a possible classification result. This project tree in combination with the project classification settings determine the classification result of documents in production.
The Layout Classifier and Adaptive Feature Classifier both support learning by example. The first step is to assign sample documents to each class. The classification engines then perform a training process, where all the sample documents are analyzed and the important features are extracted and used to define the classification result for each class. The following steps occur:
-
Definition of categories
-
Creation of a sample set
-
Feature extraction and learning
-
Classification of unknown documents
-
Automatic Improvement and incremental learning
You can now also use Classification Online Learning. If enabled, any document that fails classification and has to be manually classified during Validation is added to the Classification Set. Documents are accumulated as long as the maximum number has not been reached. The next time a document of that class is encountered, its confidence should be higher, and may not need to be manually classified if the Classification Set contains enough example documents.