Create a classification and separation benchmark set
Since classification and separation are closely related, you can use the same benchmark set golden files for both separation benchmark and classification benchmark testing.
The physical structure of your documents is not as important as it was in earlier versions of Tungsten Transformation Toolkit. This is because the new document set structure is able to assign a class for a document rather than rely on its physical location. If however, you have a bunch of test documents that are already separated into folders that match your project class hierarchy, this is fine.
Consider renaming your documents so that the name include the class name. This makes assigning a class significantly easier because you do not need to open and view each document in the Document Viewer.
You can create a classification and separation golden files document set by following these steps:
- Open the Documents window if it is not already open.
-
Add a new document set that includes the documents you want to include in your golden files.,
Enter a descriptive name for the document subset such as "Classification/Separation Golden Files."
A new document set is added to the Documents window and expands so you can see its document subsets automatically.
-
Convert your new document set to a benchmark set.
The document set type changes so that you can assign a classification for each document in the document set.
-
If a different view is in use, switch to the
List View
.
-
Assign a class so all documents in your benchmark set.
Each document is assigned a class for comparison after classification.
-
Click on your newly added document subset and click
Select All
from the toolbar or press Ctrl + A.
All documents in the selected document subset are selected in the List view.
-
Optionally, if your documents already have recognition data, right-click the selected documents and then click
Clear Classification & Extraction Data
.
This removes any existing classification or extraction data and provides you with a clean state so you can build your golden files.
-
Optionally, if your documents do not have recognition data, press
F4 to perform recognition for the selected documents.
If you want to select a different recognition engine, right-click on the selected "Extraction Golden Files" documents, click Recognize
on the shortcut menu, and then select the desired recognition engine from the submenu.
A progress bar is displayed showing the recognition progress.
-
On the document shortcut menu, click
Classify
.
Each document in the benchmark set is assigned a classification result based on your project settings and training documents if available.
-
Once again, click
Select All
for your "Classification/Separation Golden Files" document subset, and then click Save All
.
All changes to the documents are saved.
- Compare the classification result for each document against its assigned class. If any do not match, confirm that the assigned class is correct. If the assigned class is correct, then add the mismatched to the Classification Set and repeat steps 6 - 10 until the classification result matches the assigned class for all documents in your benchmark set.
-
If you added any documents to your
Classification Set, train your project. From the Ribbon
Process tab, in the
Train group, select
Classification
.
If Separation is enabled, the setting is called Classification & Separation
.
A progress bar is displayed showing the training progress.
-
If your project is configured for separation, you can now test your separation settings. From the Ribbon
Process tab, in the
Test group, click
Separate
.
For separation benchmarks, your document set cannot include any subfolders under the Root Folder. You can see the hierarchy of a document set by switching to the Hierarchy View.
The Document Separation Results window is displayed showing the separation results.
-
If the separation results do not match your document structure, review your document separation settings on the
Project Settings -
Document Separation tab.
If you are using trainable document separation, ensure that the Project Settings - Classification tab is configured to "Classify each page" for each classifier used in your project.
- Repeat steps 13 and 14 until your separation results are as desired.
-
On the Ribbon
Project tab, in the
File group, click
Save Project
.
Create a backup copy of your processed extraction golden files. This ensures that you can always access the backup copy if something happens to the working copy.
Your project is saved and your golden files are now ready for performing Classification and Separation benchmarks.