Add a custom document set for testing

Use the procedure in this topic to create a custom document set for testing.

For testing and optimizing a project, you need to load a document set into the Documents window. You can test and optimize classification and extraction results with these documents.

Transformation Designer can use three different types of source files:

The source files serve to create XDocuments that are document representations that are used by the extraction algorithms throughout the complete process during production. These XDocuments (*.xdc) contain the layout information that is needed for the extraction of the field data. Layout information consists of text elements such as words and their geometries, fonts, and location, and graphical elements such as lines, logos, and textures. By using the document shortcut menu in the Documents window, you can open the XDocument within the XDoc Browser to get access to this information.

See the Tungsten TotalAgility Scripting Help for more information about XDocuments.

If you have a pre-sorted set of documents, you can also use these to configure your project hierarchy.

Procedure

  1. Open the Documents window if it is not already open.
  2. In the Documents window, on the toolbar, select Open Document Set Open Document Set icon.

    The Open Document Set window is displayed.

  3. Configure your document set using the following settings:

    Path

    Enter a directory or browse to the location where the documents are located.

    File Type

    Select one of the following file types:

    Automatic:

    Select if the selected directory has unknown document types, or a mixture of document types. (Default: Selected)

    The file type with the most available information is loaded. This means that an folder.xfd file is loaded first, then an XDocument (*.xdc), and finally, image (*.tif, *.jpg, *.png) or text (*.txt) files are loaded.

    If you have *.xdc files in the parent directory, and *.tif files in nested directories, only the *.xdc files are visible in the Test Documents pane.

    If you does not want the file type to be selected automatically, you can select it manually.

    Folders are only displayed when folders are enabled for the project within the project settings.

    Batch (Folder.xfd):

    Loads an XFolder that contains a hierarchy of documents.

    XDocuments (*.xdc):

    Loads all available XDocuments from the selected file system location.

    Source files:

    Source files can be used for testing and optimizing your project.

    Select on ore more of the following file types:

    Image files (*.tif, *.jpg, *.png):

    Loads all available image files from the selected file system location. (Default: Cleared)

    Text files (*.txt):

    Loads all available text files from the selected file system location. (Default: Selected)

    PDF files (*.pdf):

    Loads all available PDF files from the selected file system location. (Default: Cleared)

    Include subdirectories

    Available only when the selected Path contains one or more subdirectories.

    Includes all documents in all Path subdirectories in the document set. (Default: Cleared)

    When selected, several settings are available that are not normally required when opening a document set for testing a project.

    These settings are available if you want to use a document set to configure a project hierarchy.

    Create subset for each subdirectory:

    Available only when Include subdirectories is selected.

    Adds a new document subset for each directory nested inside the selected Path directory.(Default: Cleared)

    Each document subset is named based on the corresponding directory name. You can rename the document subsets at any time.

    Assign subdirectory as class for each document:

    Available only when the Include subdirectories setting is selected. (Default: Cleared)

    Assigns a class to a document based on the subdirectory name.

    Create classes in Project Tree:

    Available only when both of the Include subdirectories and the Assign subdirectory as class settings are selected. (Default: Cleared)

    Adds a class to the Project Tree hierarchy for each subdirectory in the selected Path directory.

    The Path directory itself is not included in the Project Tree, and the names of the subdirectories are used as the class names. (Default: Cleared)

    If a document set already exists for a selected directory, none of the settings are available. If you change the file type for an existing document set, it is overwritten.

  4. Select OK to save your settings.

    If the selected document set does not exist, an error occurs.

    If you press Cancel, a dummy read-only document training set is added to the project. This allows user to re-map the training set and correct the path for the project.

    If you press OK, an attempt is made to recreate the training set folder. If this is successful, the project is loaded as normal. If the recreation fails, the project cannot be loaded.

    The Open Document Set window is closed and a new document set is added to the Documents window.

    If you save and close your project at this point, the newly added document set is not saved in the list of document sets.

  5. Optional. Associate a document set to your project by right-clicking, and selecting Attach Document Set to Project on the shortcut menu.

    This permanently associates this document set with your project so it is always visible on the Documents window.

    If you do not perform this step, assuming you chose to save the document structure, this document set is available in the "Recent Documents" list.

  6. Optional. Add documents to your document subset as required.