Project structure
The project structure is displayed in the Project Explorer. You
can create a Contained or a Partial
project; the difference lies in that the Contained project does
not need to have an Alien documents section, since all input
documents fit one of the defined document classes, whereas in the Partial
project this is not so.
Contained project structure:
-
Training set
-
Stopwords
-
Metawords
-
Phrases
-
Test set
-
Test results
Partial project structure:
-
Training set
-
Alien documents
-
-
Stopwords
-
Metawords
-
Phrases
-
Test set
-
Alien documents
-
-
Test results
In case you have a large number of documents in your Training set or Test set, you can easily locate one specific document in the Main panel view of the particular class to see all relevant information about it. While a document is selected in Project Explorer for viewing its contents, click the Locate in parent button in the header of Project Explorer and your selected document gets highlighted in green in the Main panel view of the container class.
Specify new project details in the New Project dialog:
-
Name: name of the new project
-
Location: save path of project
-
Project file: project file path
-
Language: OCR language of project; it cannot be changed after project creation.
-
Contained project: mark this radio button to create a project with no alien documents (see above)
-
Partial project: mark this radio button to create a project with alien documents (see above)

The project details are shown in the Main panel:
The Settings, Other and Statistics collapsible sections have the following items:
Settings
Classifier method: type of document classifier; choose any of the following from the dropdown after clicking in the Classifiers field
-
Both text and layout based: text and layout based document classifier
-
Layout based only: layout based document classifier
-
Text based only: text based document classifier
-
Error weights: expand this section and choose any of the three options to modify the default settings for automatic confidence threshold calculation
-
False negative
-
False positive
-
Misclassified
-
Project type: select Partial or Contained from the dropdown
-
Test training documents, too: if this checkbox is marked, training documents are also tested; it can have True (marked) and False (unmarked) values
Other
-
Created: project creation date
-
Description: provide some general comment for easier identification
-
Language: language set of the project used by OCR and word processing; cannot be changed after project creation
-
Project file path: local file path of project file
-
Selected confidence threshold: confidence threshold of the project initially set at project creation
-
Suggested confidence threshold based on the training set: optimal confidence threshold calculated on the training set documents
Statistics
-
Alien test documents: number of alien test documents
-
Classes: number of defined classes in the project
-
Hidden classes: number of hidden classes in the project; these do not participate in the training phase
Hidden training documents
: number of hidden documents in the project; these do not participate in the training phase-
Test documents: number of test documents in the project
-
Training documents: number of training documents in the project
