Identify Tab - Clustering Window

Once the configuration step is complete, use the Identify tab to view documents, create unique cluster names, and train the Clustering tool by assigning not yet identified documents to existing clusters. You can use keyboard shortcuts to identify the documents.

Statistics

This group displays the overall clustering progress bar and the following counters.

Total documents

Total number of processed documents.

Confirmed documents

The total number of documents that you have manually confirmed during the clustering iterations.

Note For layout clustering, this value includes documents that are confirmed automatically.
Unconfirmed documents

The number of documents that are identified as belonging to a cluster automatically. These identified documents are based on other documents assigned to a labeled cluster during the identification iterations. These documents require manual confirmation in order to be assigned to a cluster.

Remaining documents

Number of documents that are not yet assigned to a labeled cluster or are unclustered.

Labeled clusters

Number of cluster categories that are already labeled.

Unlabeled clusters
The number of clusters that are suggested but unlabeled.
Filter

This group provides options to filter the documents displayed in the thumbnail view. Depending on the selection, you can easily limit the documents to concentrate on one area of clustering.

No filter

If selected, no filter is applied to the documents displayed. This means that unknown documents, unconfirmed documents, and those documents assigned to unlabeled clusters are displayed. This is the default value for this option.

Unconfirmed documents for labeled clusters

If selected, all documents are displayed in the thumbnail view that were recognized and assigned to a cluster.

Documents in unlabeled clusters

If selected, only documents that were recognized and assigned to a cluster, but the cluster itself is unlabeled, are displayed.

Unclustered documents

If selected, only those documents that were not recognized during clustering and could not be assigned to a cluster are displayed.

The default thumbnail view in the center pane of the Identify tab displays up to three documents, with up to three pages per document. You can adjust the thumbnail viewer to display more documents. Each row represents a single document. Near the end of the identification process, when only one or two documents remain, fewer documents may be displayed. A small horizontal scrollbar indicates if a document contains more than three pages.

The list in each row below a candidate document contains the clusters that are defined so far. The default state is unclustered. You can manually enter a name for a new cluster. Auto-completion helps you use names from already defined clusters or existing project classes.

Tip Press Tab to quickly move between documents.

The document viewer on the right displays an enlarged page in the highlighted document. At the top of the document viewer, you can navigate to any page within the selected document by selecting the thumbnail view of the page.

In addition to the common Transformation Designer buttons, the following buttons are available:

Confirm

Click to assign the selected document to the suggested cluster.

Reassign

Click to assign the selected document a different cluster than the one suggested.

Continue Clustering

Click Continue Clustering if you have finalized the categorization of the currently displayed documents, and you want to start the next clustering iteration using training information from these newly identified documents.