Exclude training document sets from database

It is possible to exclude training samples from the database because of security concerns, or concerns about the size of the documents in the database or downloading the project for editing.

Procedure

  1. Open a project.
  2. Open the temporary location where your project is stored on disk.

    Right-click on any training document and then select Open in Windows Explorer on the shortcut menu.

  3. Copy the required folders (ClassificationTraining or ExtractionTraining) to a safe location.

    A persistent UNC path that is available to all project developers is recommended.

  4. Open the copied folders as new Test Sets from their new location.
  5. For the newly created Test Sets, right-click, and select one of the following settings on the shortcut menu:
    • Use as Classification Training Set

    • Use as Extraction Training Set

    This swaps the original training sets so they are normal test sets, and the new test sets are now the training sets.

    The original training sets, that are now test sets, are still stored and synchronized with the database. You can change this by following these steps:

    1. In one the old training sets that are now test sets, select the <All Documents> subset.
    2. Delete all documents.

      This removed the documents from disk and when the project is saved, this is synchronized with the database so there is a blank training set in the database.

    3. Repeat for the other training set if needed.

    Now, the training sets are stored outside of the database. You can now train and maintain your project as needed without worrying about storing training documents in the database.