RecAPI
RER handprinted recognition module
Module name: RER
Module identifier: RM_RER
Filling methods supported: FM_HANDPRINT, FM_CMC7, FM_OCRA, FM_OCRB, FM_MICR
FM_OMNIFONT (Thai, Hebrew)
Filters supported: all filter elements
Trade-off supported: TO_ACCURATE, TO_FAST (includes TO_BALANCED)
Knowledge base files: kadmos.uk, hand_s.rec, numplus.rec, and the below language-specific kb-files.
Knowledge base file for Thai OCR: kadmos.uk, ttf_s_th.rec.
Knowledge base file for Hebrew OCR: kadmos.uk, ttf_s_il.rec.
Training file supported: no

This module is supported on: Windows, Linux and MacOS x64.

This module is included only in the Professional Recognition Kit (not the OCR kit). To make this technology available in your application, it must be covered by your distribution licensing.

Thai and Hebrew OCR can be purchased as an add-on ("Asian Plus") to either the Professional Recognition Kit or the Professional OCR Kit.

See the topic on Licensing in the General Information help system.

This is a third-party recognition module from re Recognition GmbH (www.rerecognition.com). The Engine contains its KADMOS recognition engine.

Application areas

This recognition module can be used for recognition of handprinted alphanumerical characters, i.e. upper and lower case letters, the digits and some others. Although it can be used to read flowing text, its main application area is in form-like situations, where the form designer has great control over the content and maybe length of handprinted information given in each zone.

In addition this module recognizes Thai and Hebrew text. It can handle short embedded English texts within such language text. Thai language is accessible from version 19.0, Hebrew from 20.1. See details below.

Recognition of handprinted text

Range of characters

Selecting the filling method FM_HANDPRINT this module can differentiate 159 characters. These are the digits, 28 punctuation and miscellaneous characters (listed below), letters of the English alphabet plus all accented characters necessary for 98 languages. Fifteen languages have dictionary support: Catalan, Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Spanish and Swedish. Other supported languages include Croatian (with one limitation), Estonian, Gaelic, Indonesian, Latvian, Lithuanian, Slovak, Slovenian, Swahili, Tagalog, Turkish and Welsh (the last two with minor limitations). Cyrillic languages and Greek are not supported. These languages can be freely combined, but then dictionary support is not available.

The following punctuation characters can be recognized:

! Exclamation Mark
? Question Mark
Apostrophe-Quote
" Quotation Mark
: Semicolon
, Comma
: Colon
. Period (Full-stop)
- Hyphen-Minus
( Opening Parenthesis
) Closing Parenthesis
[ Opening Square Bracket
] Closing Square Bracket
{ Opening Curly Bracket
} Closing Curly Bracket

The following miscellaneous characters can be recognized:

# Number Sign
% Percent Sign
@ Commercial At
& Ampersand
| Vertical Bar
$ Dollar Sign
* Asterisk
+ Plus Sign
= Equals Sign
_ Spacing Underscore
/ Slash
\ Backslash
< Less-Than Sign
> Greater-Than Sign

Other supported filling methods gives additional character ranges to the capability of RER engine. The description of these ranges can be found in OCR special filling methods and in the summary table of OCR Special Characters.

Knowledge base files

The files with .rec extension are optional, removable, selectable and combinable with each other manually. The general knowledge base file hand_s.rec is installed with the module during installation of OmniPage Capture SDK v22. In addition, two language/country-specific knowledge base files are installed: hand_s_us.rec, hand_s_de.rec and the knowledge about numbers and some miscellaneous characters: numplus.rec. Other language/country-specific knowledge base files can be found in the folder RER_KBFiles of the install ZIP. (The installed files are also here.) These files are distributed as listed in the table below. Their names are in the form hand_s_??.rec, where the double question mark within the filename should be replaced by a country code as follows:

CodeLanguage(s) / Country
al Albanian
at Austrian, German
be Belgian, Dutch, French, German
ch Swiss, French, German, Italian
cs Czech, Slovakian
cz Czech
de German
dk Danish
ee Estonian
es Spanish
eu West-European
fi Finnish
fr French
hu Hungarian
ie Irish, English, Gaelic Irish
it Italian
lt Lithuanian
lv Latvian
nl Dutch
no Norwegian
pl Polish
pt Portuguese
ro Romanian
se Swedish
sf Scandinavia
sl Slovenian
sk Slovakian
tr Turkish
uk UK
us USA

If more than one language/country-specific knowledgebase files are in the Engine Binary directory, the system automatically uses the one according to the current language of the recognition.

If the User's product will be used only in a specific region, the installed languge/country-specific knowledge base files can be removed and replaced manually by the one according to that region. In this case, some accuracy improvement may be accessible. For example, in Austria it is possibly better to use hand_s_at.rec instead of hand_s_de.rec.

For a language spoken in more than one regions, there is no point using simultaneously all the knowledge base files containing that language, because the Engine cannot decide well enough between the regions. Since the API does not provide any way to specify the country, the User has to make the decision in advance.

The module requires at least one .rec file in the Engine Binary directory. It is not necessary to be hand_s.rec. On the other hand, the Distribution Wizard of the CSDK tries to copy only hand_s.rec from the binary folder into the selected file set (and sends a message, if this file is not there). Thus if you want to see a different subset of optional knowledge base files in your redistributed file set you should select and copy it manually after running the Distribution Wizard.

Accuracy issues

Handprint is much harder to recognize accurately than machine generated text, and success depends very heavily on character quality. The use of structured forms to limit the possible range of characters, together with zone-level filters and individual character validation can significantly improve accuracy (kRecSetFilterPlusEx). This recognition module can apply all the Engine’s possible filter elements to the 159-member character set it supports. Handprinted forms are usually filled by different respondents and this is liable to lower accuracy. If respondents can be given clear filling instructions (e.g. a print model to follow) and be motivated to print clearly, success will be higher.

If the handprint contains numbers only, using the RM_HNR module is likely to give better results than the RM_RER module filtered for numbers only. The functioning of the RER module can be influenced by the page-level trade-off setting.

Conditions

For successful recognition, the characters should not touch each other. Each character can be zoned individually or a zone may contain one or more lines of characters. Each character must have a height of 30-180 pixels. Well formed characters written in pen are best recognized. Pencil and felt-tip pens give poorer results. When reading from pre-printed forms, dropout colored boxes can be useful to encourage respondents to write characters of even size and spacing. But then, they mustn’t use a pen with the dropout color.

Maximum number of characters in a line: 200.

Number of lines in a zone: No restriction.

Module integration issues

The Engine cannot provide access to all the parameters of reRecognition’s KADMOS toolkit. Note however, that the recognition module can be fine-tuned through parameters of an INI file located under the section [Parm]. A sample INI file RM_RER.INI can be found in the above mentioned folder RER_KBFiles of the install ZIP. The full-path of the given INI file can be specified by the setting Kernel.Ocr.RER.UseParamFile, which replaces the function RecSetRMSpecParams of the previous CSDK versions.

Recognition of Thai and Hebrew text

RER recognition module can recognize only machine printed (FM_OMNIFONT) characters of these languages. Handprinted characters are not supported.

For recognition of such text the given language should be set (LANG_THA, LANG_HEB) and Western languages should not be set (except English in one case - see next paragraph).

The module can recognize short English texts embedded in such language text. It works in default without English language to be set. If embedded texts are in other Latin-alphabet languages, their recognition is also possible, however accented characters may not always be handled correctly.

IMPORTANT NOTE: For the correct working of the recognition of Thai and Hebrew languages, the language should be set before the preprocess operation.

Note:
The inversion detection, rotation detection and deskew detection steps of preprocess do not work for Thai and Hebrew language images, but their manual mode can be used. Fax correction does not work at all in these cases. However despeckle supports also these images.
Only the DCM_LEGACY auto-zoning algorithm works well for Thai and Hebrew language images, thus decomp method setting has no effect in this case. In addition, only WT_FLOW and WT_GRAPHIC zones are enabled for Thai and Hebrew manual zoning.
This third-party recognition module is tightly integrated to the Engine. For more information on reRecognition's handprinted recognition technology, visit their homepage (http://www.rerecognition.com).