Extraction Set
One way of ensuring your extraction results are improved is by providing training documents for extraction.
This document set is where all of the documents used for extraction training are stored. Similar to the Classification Set, the Extraction Set is created automatically when a project is created, ready for you to add documents as necessary. This document set is available in List View and Thumbnail View only.
The Extraction Set contains documents that are ideal examples with good extraction results for the classes defined in your project.
For those projects created in an earlier version of the Transformation Designer that does not use document sets, the existing Extraction Set is converted to the new document set format. This means that you do not lose any of your old training documents, and you can add additional documents to your new Extraction Set at any time.
The Extraction Set that is created automatically cannot be deleted from your project unless another document set is set as the Extraction Set. This means that if you have an existing set of documents organized in a meaningful directory, you can add it as a new document set and set it to be the Extraction Set. The original training set can now be deleted if desired.
For the best results, ensure that the documents used in any document set are located on an NTFS file system. A FAT 32 file system has a restriction on the number of files allowed, and exceeding this number could result in the loss of documents and data. For more information, seeYou can manage your Extraction Set in the following ways:
-
Add documents to a document subset in your Extraction Set
-
Add a document subset to help organize your Extraction Set
-
Adding documents to the Extraction Set for the selected class
-
Deleting document subsets or documents from your Extraction Set
-
Sort the documents based on column data
-
Filter documents based on their classification results or by adding a custom filter
-
Resolve conflicts for your training data