Identify Tab - Clustering Window
Once the configuration step is complete, use the Identify tab to view documents, create unique cluster names, and train the Clustering tool by assigning not yet identified documents to existing clusters. You can use keyboard shortcuts to identify the documents.
- Statistics
-
This group displays the overall clustering progress bar and the following counters.
- Total documents
-
Total number of processed documents.
- Confirmed documents
-
The total number of documents that you have manually confirmed during the clustering iterations.
For layout clustering, this value includes documents that are confirmed automatically. - Unconfirmed documents
-
The number of documents that are identified as belonging to a cluster automatically. These identified documents are based on other documents assigned to a labeled cluster during the identification iterations. These documents require manual confirmation in order to be assigned to a cluster.
- Remaining documents
-
Number of documents that are not yet assigned to a labeled cluster or are unclustered.
- Labeled clusters
-
Number of cluster categories that are already labeled.
- Unlabeled clusters
- The number of clusters that are suggested but unlabeled.
- Filter
-
This group provides options to filter the documents displayed in the thumbnail view. Depending on the selection, you can easily limit the documents to concentrate on one area of clustering.
- No filter
-
If selected, no filter is applied to the documents displayed. This means that unknown documents, unconfirmed documents, and those documents assigned to unlabeled clusters are displayed. This is the default value for this option.
- Unconfirmed documents for labeled clusters
-
If selected, all documents are displayed in the thumbnail view that were recognized and assigned to a cluster.
- Documents in unlabeled clusters
-
If selected, only documents that were recognized and assigned to a cluster, but the cluster itself is unlabeled, are displayed.
- Unclustered documents
-
If selected, only those documents that were not recognized during clustering and could not be assigned to a cluster are displayed.
The default thumbnail view in the center pane of the Identify tab displays up to three documents, with up to three pages per document. You can adjust the thumbnail viewer to display more documents. Each row represents a single document. Near the end of the identification process, when only one or two documents remain, fewer documents may be displayed. A small horizontal scrollbar indicates if a document contains more than three pages.
The list in each row below a candidate document contains the clusters that are defined so far. The default state is unclustered. You can manually enter a name for a new cluster. Auto-completion helps you use names from already defined clusters or existing project classes.
The document viewer on the right displays an enlarged page in the highlighted document. At the top of the document viewer, you can navigate to any page within the selected document by selecting the thumbnail view of the page.
In addition to the common Transformation Designer buttons, the following buttons are available:
- Confirm
-
Click to assign the selected document to the suggested cluster.
- Reassign
-
Click to assign the selected document a different cluster than the one suggested.
- Continue Clustering
-
Click Continue Clustering if you have finalized the categorization of the currently displayed documents, and you want to start the next clustering iteration using training information from these newly identified documents.