RecoStar HOCR plugin

The RecoStar HOCR plugin is a part of the Page Process module by default. This plugin uses the RecoStar OCR (optical character recognition) engine for generating HOCR XML files. This plugin reads the image files listed in the batch.xml file of a batch instance and generates an HOCR XML file for each batch instance.

Transact can decode bar code values with this plugin using the bar code-enabled .rsp project file.

Configure the RECOSTAR_HOCR plugin

Perform these steps to configure the RECOSTAR_HOCR plugin in the Page Process module:

The Administrator user account is required for this procedure.

  1. Launch Transact and select Administrator > Batch Class Management.

    When prompted to log in, provide the appropriate parameters.

    The Batch Class Management screen appears, displaying all the batch classes currently contained in Transact.

  2. Open the batch class to be configured. Select the batch and click Open.
  3. On the navigation pane, expand the Modules section for this batch class. Click Page Process to display the plugins currently configured for the Page Process module.
  4. Select the RECOSTAR_HOCR plugin. The Plugin Configuration screen appears on the right.
  5. Define the settings for the Recostar_HOCR plugin. The following table lists the configurable properties for this plugin.

    Configurable property

    Type of value

    Value options

    Description

    RecoStar Project File Name

    List of values

    • Fpr.rsp

    • Fpr_Multi Language.rsp

    • Fpr_ Barcode.rsp

    This option is used to specify the project file name used for performing OCR.

    RecoStar color switch

    List of values

    • ON

    • OFF

    If the color switch is set to ON, then a PNG file is used for performing OCR.

    If the color switch is set to OFF, then a TIF file is used for performing OCR.

    RecoStar Auto Rotate switch

    List of values

    • ON

    • OFF

    This property is used to auto-rotate the input images on the basis of orientation computed by the RecoStar engine.

    RecoStar Switch

    List of values

    • ON

    • OFF

    Use this switch to enable or disable this plugin.

    Barcode Switch

    List of values

    • ON

    • OFF

    This property is used to read the bar code from the input images using the bar code-enabled RecoStar project file, such as FPR_Barcode.rsp.

    RecoStar Valid Extensions

    List of values

    • tif

    • gif

    • png

    RecoStar supports these three formats when performing OCR. The user can configure the allowable format of the image for performing OCR.

    RecoStar De-skew Switch

    List of values

    • ON

    • OFF

    This property determines whether or not input images must be de-skewed.

    OCR Country/Language

    List of values

    Multiple countries and languages

    Type the country or countries to be supported in OCR operations or the language or languages to be supported. If you enter more than one value you must separate each value with a semi-colon (;) and no space. The system populates a drop-down menu when you start typing a value in the field.

    RecoStar Font Switch

    List of values

    • ON

    • OFF

    This Font Recognition can detect potential fraud and tampering with processed documents. The HOCR file reflects the font style (Bold or Italics or Underline) and font size if the Font switch is set to ON in the RECOSTAR_HOCR or OMNIPAGE_HOCR plugins.

    The RecoStar Font Switch allows the user to detect any data that has been manually altered or added to the documents. By default, the Font Switch is set to OFF. The following example illustrates how to use this setting:

    • The original amount of a field in a document is 1000 and the font size is 11.

    • Assume this value is manually changed to 41000 and the 4 is written in a size 12 font. The system will recognize the font size and style in the HOCR file.

    • This helps the user to identify that the document has been tampered with.

    For additional information about the RecoStar Font Switch setting, see Fraud detection using OCR font switch.

Match RECOSTAR_HOCR and RECOSTAR_EXTRACTION_PLUGIN

If you are using the Recostar HOCR plugin and the Recostar Extraction plugin in your batch class, the configuration for these two plugins must match for the color documents. If the color switch is turned on in the RecoStar HOCR plugin, the same switch must be turned on in the RecoStar Extraction plugin.

For more information, see RecoStar Extraction plugin.

Dependencies

Input image

This plugin only requires an image as an input. This plugin supports the PNG image format if the color switch is ON, and the TIFF image format if the color switch is OFF.

Note that one of the following plugins must be present in the Page Process module prior to the RECOSTAR_HOCR plugin:

  • Create OCR Input plugin

  • Create Display Image plugin

Shared folders

The batch class folder inside the main shared folder contains this folder: recostar-extraction.

This folder contains the RecoStar Project file as specified by the first property.

If the selected file does not exist in the folder, then the file with the same name that is located in the native RecoStar, is used for performing OCR:

<Transact-Installation-Folder>\Application\native\RecostarPlugin\bin

Use the RECOSTAR_HOCR plugin

This plugin works in the Page Process module of the Ephesoft Transact workflow when all the import processing on the batch has been completed.

  1. This plugin extracts the contents of the image using the RecoStar OCR engine.
  2. This plugin uses the .rsp file present inside the following directory and folder:

    <SharedFolder-Installation-Folder>\BC*\recostar-extraction

    If there is no .rsp project file present in the folder mentioned above, then the file is present in the following directory and folder instead:

    <Transact-Installation-Folder>\Application\native\RecostarPlugin\bin

  3. If the Barcode switch is ON (enabled), then the .rsp project file being used to perform OCR must be enabled for bar code extraction.

Troubleshooting

If a malfunction occurs with the RecoStar HOCR plugin, the following error messages may appear.

Error message

Possible root cause

Invalid License, so could not be verified.

  • Network connection failure

  • RecoStar command is not valid

  • License is not installed or invalid

  • Tomcat server is not started

Problem in verifying License.

Unable to connect to the Transact license server or some error occurred on the Transact license server side.

Unable to load Fpr.rsp file.

RSP file used for processing is invalid.

Exception while reading from XML.

Unable to process the batch.xml file or batch.xml is invalid.

No valid extensions are specified in resources.

No valid extension has been selected.

Image Processing or XML updating failed.

Unable to update batch xml.

File has invalid extension.

File processed by RecoStar has invalid extension.

Unable to parse Orientation tag in RecoStar xml file.

RecoStar xml file has invalid value for Orientation tag.

Unable to rotate the file: according to the values specified in its xml.

RecoStar xml file has invalid value for rotation.