RecoStar Full Page Extraction activity

This activity is used to extract documents with RecoStar Full Page Reader.

Note: This activity can only run once on any given server. Therefore, when you assign the activity to an XBOUND Activities Service (see XBOUND Help for instructions), the number of instances of this activity must be 0 or 1.

If for some reason you need to add this activity to the Process Designer, the file to add isxboundActOcfRecoStarFullPageExtract.dll.

These settings are available when configuring a process step of this activity type. (For information about configuring process steps, see XBOUND Help.)

Language

Select the language to be used.

Ignore empty pages

Select if you do not want to extract documents that are marked as empty.

Process unprocessed or invalid documents only

Select to extract only those documents that have not been successfully validated yet.

Save classify/extraction duration into sub-documents

Select to store the processing duration per document.

Save character data

Select this option only when you need information about individual characters. That is the case when you want to rubberband not only entire words but parts of words during Verification, for example.

This option creates a character object in the data model for each character. When this option is not selected, the characters are saved as a string in the corresponding Word object.

Warning: This option significantly increases the memory requirements during extraction and Verification.

Image selection

Select whether to extract TIFF or JPEG images.

Apply to following document types

Select to extract only documents of certain document types. Then select all the document types that are to be extracted. If documents without types are to be extracted as well, select the No document type option.

Export OCR data

Select to export the extracted data and its images to a text file. This is used to "learn" classification, for example.

In Export path, specify where to save the text file.

Create OCR data for ReadSoft Capture Components

The OCR results from the RecoStar engine are made available for further processing in RCC by changing the name of the engine in the read results to "ReadSoft Capture Components." In the RCF application, the Engine (OCR Engine Name) will be listed as ReadSoft Capture Components instead of RecoStarFpr.

These settings are also available:

Import button

Imports settings from an XML file that was previously created using Export.

Export button

Exports the settings to an XML file. Specify the file name and location. You can then import the XML file to get the same settings.

Check regular expression link

Opens a test form, where you can check a regular expression.

ReadSoft Capture Framework activities: Overview

XBOUND activities: Overview (XBOUND Help topic)