Classification Set
One way of ensuring that the documents processed by your project are correctly classified is by providing training documents for your Classification Set. This document set is created automatically when a project is created, ready for you to add documents as needed.
For those projects created in an earlier version of the Transformation Designer that does not use document sets, the existing Classification Set is converted to the new document set format. This means that you do not lose any of your old training documents, and you can add additional documents to your new Classification Set at any time.
This also means that it is no longer necessary to add the same document for both layout and content classification. Adding it once ensures that both classification methods are trained.
All documents that are added to the training set are added in .xdc format. This means that you can look at the document content after it is trained in the document viewer as an image file, as a text file, and all coordinates and location information is available.
The Classification Set that is created automatically cannot be deleted from your project unless another document set is set as the Classification Set. This means that if you have an existing set of documents organized in a meaningful directory hierarchy, you can add it as a new document set and set it to be the Classification Set. The original Classification Set that was preconfigured when your project was created can now be deleted if desired.
For the best results, ensure that the documents used in any document set are located on an NTFS file system. A FAT 32 file system has a restriction on the number of files allowed, and exceeding this number could result in the loss of documents and data. For more information, seeYou can manage your Classification Set in the following ways:
-
Add documents to a document subset in your Classification Set
-
Add a document subset to help organize your Classification Set
-
Adding documents to the Classification Set for the selected class
-
Deleting document subsets or documents from your Classification Set
-
Sort the documents based on column data
-
Filter documents based on their classification results or by adding a custom filter