RecoStar HOCR for Linux (Beta)

The RECOSTAR_HOCR plugin improves OCR accuracy for non-English languages. It also provides more pre-OCR image processing flexibility for non-English languages.

This is the first part of a multi-phased approach to provide additional RecoStar-based plugins for ingestion, classification, extraction, and export. This will include, but may not be limited to the following:

  • Document import

  • Barcode extraction

  • Fixed form extraction

  • Native ICR/OMR configuration

  • Batch export

The RECOSTAR_HOCR plugin for Linux will remain in beta until it is certified for production or Ephesoft determines that it is feature complete based on feedback from our customers. The plugin is provided out-of-box and is optional to use. The RECOSTAR_HOCR plugin should not be used in place of the OMNIPAGE_HOCR plugin unless a specific use case requires it.

Limitations

The RECOSTAR_HOCR plugin for Linux has the following known limitations:

  • No EText support

  • Performance is reduced by up to 30% compared to RECOSTAR_HOCR for Windows

  • The plugin has no bar code support and should remain OFF

Other RECOSTAR_HOCR plugin features that are available in the Windows version of the plugin are not yet supported in the Linux version. These include, but are not limited to, the following:

  • Fixed form extraction

  • Bar code extraction

  • Native key-value snippet ICR extraction

  • WebServices import or export capabilities

Prerequisites

To configure and use the RECOSTAR_HOCR plugin, the following configurations must be in place:

  • You need Transact installed.

  • You need a batch class with a document type configured. For detailed steps, see Add new document type.

  • You need to add the RECOSTAR_HOCR plugin to the Page Process module for the batch class. For more detailed steps, see Modules and plugins.

Remove any other HOCR plugins from the batch class Page Process module.

Configure the RECOSTAR_HOCR plugin

This section provides information on how to configure the RECOSTAR_HOCR plugin. This plugin only needs to be configured once per batch class.

To configure the plugin, do the following:

  1. From the Batch Class Management page, select and open your batch class.
  2. Go to Modules and select the Page Process module folder.

    The Plugin Configuration screen appears.

  3. From the Plugin Configuration, locate the RECOSTAR_HOCR plugin in the Associated Plugins pane.
  4. Select the plugin and click the Add Selected icon to move it to the Selected Plugins pane.
  5. Click Deploy.
  6. Expand the Page Process module folder and select the RECOSTAR_HOCR plugin. The Plugin Configuration screen appears.

    The following table lists the configurable properties for this plugin.

    Configurable property

    Options

    Descriptions

    Image OCR Recostar Project File Name

    • Fpr.rsp

    • Fpr_MultiLanguage.rsp

    • Fpr_Barcode.rsp

    This option is used to specify the project file name used for performing OCR.

    Recostar Auto Rotate switch

    • ON

    • OFF

    This property is used to auto-rotate the input images on the basis of orientation computed by the RecoStar project.

    Recostar Switch

    • ON

    • OFF

    Use this switch to enable or disable the plugin.

    Barcode Switch

    • ON

    • OFF

    Ensure this switch is set to OFF due to limitations in this beta. This property is used to read the bar code from the input images using the barcode-enabled RecoStar project FPR_Barcode.rsp file.

    Recostar Deskew Switch

    • ON

    • OFF

    This switch determines whether or not input images must be deskewed.

    Recostar Font Switch

    • ON

    • OFF

    The RecoStar Font Switch allows the user to detect any data that has been manually altered or added to the documents. By default, the Font Switch is set to OFF.

    OCR/Country/Language

    Multiple countries and languages

    Type the country, countries, language, or languages that need to be supported during OCR operations. When adding multiple values, separate each value with a semicolon (;) and no space. The system populates a drop-down menu when you start typing a value in the field.