RecAPI: OCR Engines

Introduction

OmniPage Capture SDK has multiple OCR engines, which can be applied on a per-zone basis. This allows a wide range of different types of textual and non-textual data to be recognized, even if they all appear on the same page. For each zone, the application can assign any module available in the configuration, or the choice of module can be left to the Engine.

Using the recognition services of any of these modules for development purposes requires license activation. The different engines require individual licenses, but of course these licenses can be combined and can be made usable with one activation. For more information see the licensing topics of the General Information help system.

Omnifont recognition modules

FRX multi-lingual omnifont recognition module
MOR multi-lingual omnifont recognition module
MTX omnifont recognition module
PLUS2W and PLUS3W omnifont recognition modules

Other recognition modules

BAR barcode recognition module
DOT 9-pin draft dot-matrix recognition module
HNR handprinted numeral recognition module
MAT matrix matching recognition module
OMR optical mark recognition module
RER handprint recognition module
Asian recognition module

NOTE: the recognition modules MTX, DOT, HNR, MAT are supported on: Windows, RER is supported on: Windows, Linux, Mac OS X.

Filling method - Recognition module combinations

When the value RM_AUTO is set, either by default or explicitly, the Engine takes care of recognition module selection for any filling method. When setting specific values for filling methods and recognition modules, it is the programmer’s responsibility to specify a valid recognition module-filling method pair. Any incorrectly set zones will have no recognition results. The following table shows which modules are considered by the automatic recognition module selection, called up by the RM_AUTO value. The order of the recognition modules in the second column shows the priority order for the automatic recognition module selection.

Filling method	Permissible Recognition modules
FM_OMNIFONT	RM_OMNIFONT_PLUS2W, RM_OMNIFONT_PLUS3W, RM_OMNIFONT_MOR, RM_OMNIFONT_FRX, RM_OMNIFONT_MTX
FM_DRAFTDOT9	RM_OMNIFONT_PLUS2W, RM_OMNIFONT_PLUS3W, RM_DOT, RM_OMNIFONT_MTX
FM_BARCODE	RM_BAR
FM_OMR	RM_OMR
FM_HANDPRINT	RM_HNR, RM_RER
FM_DRAFTDOT24	RM_OMNIFONT_PLUS2W, RM_OMNIFONT_PLUS3W, RM_OMNIFONT_MOR, RM_OMNIFONT_FRX, RM_OMNIFONT_MTX
FM_OCRA	RM_OMNIFONT_MOR, RM_OMNIFONT_MTX, RM_MAT, RM_RER
FM_OCRB	RM_OMNIFONT_MOR, RM_OMNIFONT_MTX, RM_MAT, RM_RER
FM_MICR	RM_MAT, RM_RER
FM_BARCODE2D	RM_BAR
FM_DOTDIGIT	RM_MAT
FM_DASHDIGIT	RM_MAT
FM_CMC7	RM_RER, RM_MAT
FM_NO_OCR	-

Note:: For some important details about filling method detection can be found in the section Detection of filling method of zones.

Recognition modules and setting of their performance

Character recognition is a rather complex process involving many different modules, like Pre-processing, Zoning and Recognition. Most of these modules can work in several ways using different algorithms for performing their tasks. The different algorithms may affect the performance of the whole OCR process. OmniPage CSDK provides several settings for selecting the algorithms of each part, so quite fine balancing can be achieved. However, the huge number of combinations require huge amount of testing for finding the best one.

There is an accuracy/speed trade-off setting specified by kRecSetRMTradeoff, having 3 different values: TO_ACCURATE, TO_BALANCED and TO_FAST. Five recognition modules can interpret these. Precise information appears in the sub-heading for each module. Other modules (Pre-processing and Zoning) can also be affected by this setting; see Kernel.OcrMgr.TradeOff.Preprocess and DCM_AUTO. Theoretically you can use this setting to specify 3 different Speed/Acccuracy prcessing modes, but this setting has a serious issue: it does change the functioning of a given recognition module, but you still have to select this recognition module (like RM_OMNIFONT_FRX, RM_OMNIFONT_PLUS3W, RM_ASIAN, etc.). The problem is that recognition modules are language dependent, you can not use the same module both for Western text and Asian. Another problem is that a single recognition module does not give you the broadest speed/accuracy range. You might need different modules for the Fast and Accurate workflows: for example for the fastest Western processing you could use the RM_OMNIFONT_FRX module, while for the most accurate one you should use RM_OMNIFONT_PLUS3W.

CSDK provides two settings (Kernel.OcrMgr.PreferAccurateEngine and Kernel.OcrMgr.PreferFastEngine) for selecting predefined combinations of the algorithms for all the modules. These settings have effect on the different parts of the workflow only when the given parts are working in auto mode; thus fine-tuning the functioning of those modules is still possible.

The following sample code is our suggestion for a 3-mode performance handling:

    // OmniPage has 3 large modules that have different "tradeoff" setting possibilities: Pre-processing, Zoning and Recognition.
    // Let's use the first two in their most accurate mode, and change Recognition only on the user's request:

    kRecSetIntSetting(sid, "Kernel.OcrMgr.TradeOff.Preproc", TO_ACCURATE);     // Use the most accurate deskew, binarization, despeckle, resolution enhancement
    kRecSetDecompMethod(sid, DCM_STANDARD);                                    // Use the most accurate zoning
    // The above two settings are suggested for high performance PC architectures only; we suggest not to use them on embedded systems with slower CPUs
    kRecSetRMTradeoff(sid, TO_ACCURATE);                                       // Use the accurate mode of the selected engine (except when changed below)
    kRecSetIntSetting(sid, "Kernel.OcrMgr.DisableFullPostProc", FALSE);        // Do text post-processing even in fast mode (needed for formatted output)

    if (... Prefer speed) {
        kRecSetIntSetting(sid, "Kernel.OcrMgr.PreferFastEngine", TRUE);        // Use the FWX engine for Western languages, and use AE_FAST mode for Asian languages
        kRecSetIntSetting(sid, "Kernel.OcrMgr.PreferAccurateEngine", FALSE);
    } else if (... Balanced) {
        kRecSetIntSetting(sid, "Kernel.OcrMgr.PreferFastEngine", FALSE);
        kRecSetIntSetting(sid, "Kernel.OcrMgr.PreferAccurateEngine", FALSE);
        kRecSetRMTradeoff(sid, TO_BALANCED);                                   // Use the balanced mode of the 2way engine, or AE_FULL mode for Asian languages
    } else { // Most accurate
        kRecSetIntSetting(sid, "Kernel.OcrMgr.PreferFastEngine", FALSE);
        kRecSetIntSetting(sid, "Kernel.OcrMgr.PreferAccurateEngine", TRUE);    // Use 3way voting for Western languages, and use AE_VOTE mode for Asian languages
    }
    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Generally, every other settings (like default recognition module, default filling method, etc.) should be left at their default value. The language or languages must be specified of course.

Differences on Embedded Systems

Embedded CSDK's default configuration differs from a Windows or desktop Linux configuration to compensate the performance difference between the CPUs:

The default value of the setting Kernel.OcrMgr.PreferAccurateEngine is FALSE in case of Embedded OCR.
Paralelly, the default value of the setting Kernel.OcrMgr.PreferFastEngine is TRUE in case of Embedded OCR. This has an effect on several modules:
- Image Pre-processing:
  [only if Kernel.OcrMgr.TradeOff.PreProc is default (-1)]
  - Resolution Enhancement: RE_LEGACY instead of RE_STANDARD
    [only if Kernel.Img.ResolutionEnhancement is default (RE_AUTO)]
  - Binarization: BIN_HIST instead of BIN_MIX
    [only if Kernel.Img.Binarization.SecondaryMode.AutoMethod is default (BIN_AUTO)]
  - Deskew:
    - Performed on the BW image, not the original
    - Color/Gray image deskewed without interpolation
    - Minimal slope is a bit larger
  - Despeckle:
    - A faster algorithm is used
    - Halftone despeckle detection is not done
      [only if Kernel.Img.Despeckle.Halftone is default (-1)]
- Zoning:
  - DCM_LEGACY is used instead of DCM_STANDARD
    [only if Kernel.Decomp.Method is default (DCM_AUTO)]
- Recognition:
  - non-Asian languages: RM_OMNIFONT_FRX is used instead of RM_OMNIFONT_PLUS3W
    [only if Kernel.OcrMgr.DefaultRecognitionModule is default (RM_AUTO)]
  - Asian languages: AE_FAST is used instead of AE_FULL
    [only if Kernel.OcrMgr.Asian.Engine is default (AE_AUTO)]
- Post-processing:
  [only if Kernel.OcrMgr.DisableFullPostProc is default (-1))
  - Background and Foreground color detection is not done
  - Rule lines and frames are not detected
  - Bullet recognition is limited
  - Font attribute detection is limited
  - DropCap and Superscript/Subscript attributes are not detected
  - Paragraphs are not marked with R_ENDOFPARA
  - Language consolidation is not done (e.g. hyphenated words may not be marked as dictionary words)

If you want more accurate recognition you may disable the Kernel.OcrMgr.PreferFastEngine setting by changing it to FALSE, or you can fine tune processing by choosing individual module specific algorithms, e.g. RM_OMNIFONT_PLUS3W instead of RM_AUTO. See the [only if ... is default] comments above.

Recognition modules and the checking module

The checking module has two basic services. It can flag unacceptable recognition results without changing them or it can be permitted to modify recognition results using checking module feedback. The available acceptance rules can come from the following:

Language and/or Vertical dictionaries,
User dictionaries containing precise entries.

These two sources may be combined freely. The checking module and each of its two parts can be enabled or disabled on a per-zone basis. The integrator should try to match the particular parts of the checking module to the contents and recognition modules of individual zones, e.g. allowing checking changes with a language dictionary enabled will be either pointless or even harmful for the modules RM_BAR, RM_OMR and RM_HNR, since it could change their numerical solutions to letters.

Recognition modules and the widest available character set

The correct assignment of a recognition module and a filling method to a zone should mean that the recognition module is able to satisfactorily process the contents of that zone. But it does not guarantee that the recognition module will be able to process every possible character. The characters supported by the Engine are listed in Characters and Code Pages. Most recognition modules recognize only a subset of these. Even if we restrict the Character Set to a limited Language Environment e.g. selecting the German language, the recognition module may not be able to process all the enabled characters. E.g. RM_HNR is able to process hand printed numerals but does not recognize letters. Automatic recognition module selection takes Character Set support of modules into consideration. Selecting a recognition module directly, it is the programmer’s responsibility to select a recognition module capable of supporting the widest character set enabled in the zone. Otherwise this zone may have an incomplete recognition result. The precise character and language support for each module is given in the appropriate recognition module specifications.

Recognition modules and filters

Narrowing the Character Set has two effects:

it influences the automatic recognition module selection
it may increase accuracy.

The filtering system allows the Language environment to be narrowed, by enabling only certain character classes, and also by enabling individual characters. A filter is built up from filter elements, as detailed under CHR_FILTER.

Each filter element name tells which character class is enabled, e.g. FILTER_ALL means no filtering. Not all recognition modules interpret all filter elements. Precise information appears in the sub-heading for each module.

Applying a filter may not always enable the same number of characters. E.g. FILTER_MISCELLANEOUS can enable only those miscellaneous characters supported by the recognition module assigned to the zone.

Using training data

Training of recognition modules is supported on: Windows, Linux.

The text recognition modules are trainable, allowing the application to achieve greater accuracy, particularly on stylized fonts, and whenever certain characters are being repeatedly mis-recognized in the same way. Training files can be created as a result of a training session performed in a Capture SDK based application, which incorporates the training feature of the OmniPage application or the TEC Text Editor Control of the CSDK. The created training file can be set calling kRecSetTrainingFileName.

Performance data (engine comparison test)

A simple comparison diagram can be seen here. Any other accuracy and time information of the different engine and trade-off configurations can be accessible through the Technical Support. For more details about our Technical Support please see the General Information help system.