Text extraction

With Test Extraction, you can test the extraction results without running an entire batch. By running Test Extraction, you can evaluate the result for a particular input and verify which kind of extraction is best suited for the particular document type.

Steps to run test extraction

To start the Test Extraction, upload an image from the batch class screen either by dropping a file or clicking Upload Test Extraction File(s) located at the bottom of the screen. The uploaded image can be a single page or a multipage TIFF or PDF file.

Test Extraction can be run for different extraction plugins by altering two properties:

  • Classification Types

  • Extraction Plugins

Test Extraction requires the input document to undergo operations configured in the Document Assembler and KV Page Process plugins for the selected classification type. It supports the following classification methods:

  • SearchClassification

  • BarcodeClassification

  • ImageClassification

By selecting ImageClassification in the Classification drop-down list, the Document Assembler and KV Page Process plugins would perform only those operations that are necessary for Image Classification only, if the scripts are configured for both the modules it would be executed at the same time.

To run the test extraction, you need to configure all the relevant plugins in both the modules for the batch class.

The extraction runs on the input document irrespective of whether the extraction switch value is ON. However, the plugins need to be configured in the batch class to perform the extraction and so does the Extraction Script plugin.

Functionality

On the Test Extraction screen, you can perform the following actions by clicking the corresponding buttons:

  • Extract: Perform extraction on the input document and generate the result. If no extraction result is returned, the "no values extracted" appears on the screen.

  • Download: Download the extracted results XML file. The schema of the downloaded XML file is similar to batch.xml produced while running the batch. However, the batch has empty batch instance identifier fields.

  • Clear: Clear the extraction results and the XML files. This enables you to perform other extraction on the input document.

Apart from these, you can click the close button to close the Test Extraction screen.

Result

After the extraction plugin returns a result, it is displayed on the screen. There are two ways to view test extraction results:

  • Extracted DLF

  • Data Table

Extraction DLF

Extracted Document Level Fields (DLF), also referred to as Add index field, are the key-value extracted fields in a document type. The DLFs display index field details.

When using test extraction for Extracted DLF, the following information is displayed:

  • Page ID

  • Page Name

  • Value Extracted

  • Field Details

  • Extraction Type

  • Confidence Level

The DLFs are displayed by their Field Order number.

The Format Values switch is for format conversion. Set the switch to ON to apply the Format conversion rule defined in the batch class.

When the Workflow switch is ON, test extraction (and classification) uses the settings of the batch class. In the OFF state, you can use the options in the Classification and Extraction drop-down lists to control the process. No other setting or configuration can be used in this state.

Data table

In the case of table extraction, the data is populated under the DataTable section under the classified document. This shows test extraction data for all columns that have been configured. If columns have not been configured for table extraction, nothing will be extracted there.

If no value is extracted from the input document, no Extracted DLF will appear.