RecAPI
|
Module name: | RER |
Module identifier: | RM_RER |
Filling methods supported: | FM_HANDPRINT, FM_CMC7, FM_OCRA, FM_OCRB, FM_MICR FM_OMNIFONT (Hebrew) |
Filters supported: | all filter elements |
Trade-off supported: | TO_ACCURATE, TO_FAST (includes TO_BALANCED) |
Knowledge base files: | hand_s.rec , numplus.rec , and the below language-specific kb-files. |
Knowledge base file for Hebrew OCR: | ttf_s_il.rec . |
Training file supported: | no |
This module is supported on: Windows, Linux and MacOS x64.
This module requires the Recognition Add-on for handprint recognition. Hebrew OCR requires the Asian Plus Add-on. See the topic on Licensing in the General Information help system.
This is a third-party recognition module from re Recognition GmbH (www.rerecognition.com). The Engine contains its KADMOS recognition engine.
This recognition module can be used for recognition of handprinted alphanumerical characters, i.e. upper and lower case letters, the digits and some others. Although it can be used to read flowing text, its main application area is in form-like situations, where the form designer has great control over the content and maybe length of handprinted information given in each zone.
In addition this module recognizes Hebrew text. It can handle short embedded English texts within such language text. See details below.
Selecting the filling method FM_HANDPRINT this module can differentiate 159 characters. These are the digits, 28 punctuation and miscellaneous characters (listed below), letters of the English alphabet plus all accented characters necessary for 98 languages. Fifteen languages have dictionary support: Catalan, Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Spanish and Swedish. Other supported languages include Croatian (with one limitation), Estonian, Gaelic, Indonesian, Latvian, Lithuanian, Slovak, Slovenian, Swahili, Tagalog, Turkish and Welsh (the last two with minor limitations). Cyrillic languages and Greek are not supported. These languages can be freely combined, but then dictionary support is not available.
The following punctuation characters can be recognized:
! | Exclamation Mark |
? | Question Mark |
‘ | Apostrophe-Quote |
" </TD><TD> Quotation Mark </TD></TR>
<TR><TD> : </TD><TD> Semicolon </TD></TR>
<TR><TD> , </TD><TD> Comma </TD></TR>
<TR><TD> : </TD><TD> Colon </TD></TR>
<TR><TD> . </TD><TD> Period (Full-stop) </TD></TR>
<TR><TD> - </TD><TD> Hyphen-Minus </TD></TR>
<TR><TD> ( </TD><TD> Opening Parenthesis </TD></TR>
<TR><TD> ) </TD><TD> Closing Parenthesis </TD></TR>
<TR><TD> [ </TD><TD> Opening Square Bracket </TD></TR>
<TR><TD> ] </TD><TD> Closing Square Bracket </TD></TR>
<TR><TD> { </TD><TD> Opening Curly Bracket </TD></TR>
<TR><TD> } </TD><TD> Closing Curly Bracket </TD></TR>
</TABLE>
The following miscellaneous characters can be recognized:
<TABLE BORDER="0" cellspacing="0" cellpadding="5">
<TR><TD> # </TD><TD> Number Sign </TD></TR>
<TR><TD> % </TD><TD> Percent Sign </TD></TR>
<TR><TD> @ </TD><TD> Commercial At </TD></TR>
<TR><TD> & </TD><TD> Ampersand </TD></TR>
<TR><TD> | </TD><TD> Vertical Bar </TD></TR>
<TR><TD> $ </TD><TD> Dollar Sign </TD></TR>
<TR><TD> * </TD><TD> Asterisk </TD></TR>
<TR><TD> + </TD><TD> Plus Sign </TD></TR>
<TR><TD> = </TD><TD> Equals Sign </TD></TR>
<TR><TD> _ </TD><TD> Spacing Underscore </TD></TR>
<TR><TD> / </TD><TD> Slash </TD></TR>
<TR><TD> \ </TD><TD> Backslash </TD></TR>
<TR><TD> < </TD><TD> Less-Than Sign </TD></TR>
<TR><TD> > </TD><TD> Greater-Than Sign </TD></TR>
</TABLE>
Other supported filling methods gives additional character ranges to the capability of RER engine. The description of these ranges can be found
in \ref CHR_OCRSPECFM "OCR special filling methods".
@subsection RER_KBFILES Knowledge base files
The files with <tt>.rec</tt> extension are optional, removable,
selectable and combinable with each other manually. The general knowledge base file \c hand_s.rec is
installed with the module during installation of OmniPage Capture SDK v2025.1. In addition, two language/country-specific knowledge base
files are installed: \c hand_s_us.rec, \c hand_s_de.rec and the knowledge about numbers and some miscellaneous characters: \c numplus.rec.
Other language/country-specific knowledge base files can be found in the folder \c RER_KBFiles of the install ZIP. (The installed files are also here.)
These files are distributed as listed in the table below. Their names are in the form <tt>hand_s_??.rec</tt>, where the double question mark within the filename
should be replaced by a country code as follows:
<TABLE BORDER="0" cellspacing="0" cellpadding="5">
<TR><TD> Code</TD><TD> Language(s) / Country </TD></TR>
<TR><TD> al </TD><TD> Albanian </TD></TR>
<TR><TD> at </TD><TD> Austrian, German </TD></TR>
<TR><TD> be </TD><TD> Belgian, Dutch, French, German </TD></TR>
<TR><TD> ch </TD><TD> Swiss, French, German, Italian </TD></TR>
<TR><TD> cs </TD><TD> Czech, Slovakian </TD></TR>
<TR><TD> cz </TD><TD> Czech </TD></TR>
<TR><TD> de </TD><TD> German </TD></TR>
<TR><TD> dk </TD><TD> Danish </TD></TR>
<TR><TD> ee </TD><TD> Estonian </TD></TR>
<TR><TD> es </TD><TD> Spanish </TD></TR>
<TR><TD> eu </TD><TD> West-European </TD></TR>
<TR><TD> fi </TD><TD> Finnish </TD></TR>
<TR><TD> fr </TD><TD> French </TD></TR>
<TR><TD> hr </TD><TD> Croatian </TD></TR>
<TR><TD> hu </TD><TD> Hungarian </TD></TR>
<TR><TD> ie </TD><TD> Irish, English, Gaelic Irish </TD></TR>
<TR><TD> it </TD><TD> Italian </TD></TR>
<TR><TD> lt </TD><TD> Lithuanian </TD></TR>
<TR><TD> lv </TD><TD> Latvian </TD></TR>
<TR><TD> nl </TD><TD> Dutch </TD></TR>
<TR><TD> no </TD><TD> Norwegian </TD></TR>
<TR><TD> pl </TD><TD> Polish </TD></TR>
<TR><TD> pt </TD><TD> Portuguese </TD></TR>
<TR><TD> ro </TD><TD> Romanian </TD></TR>
<TR><TD> se </TD><TD> Swedish </TD></TR>
<TR><TD> sf </TD><TD> Scandinavia </TD></TR>
<TR><TD> si </TD><TD> Slovenian </TD></TR>
<TR><TD> sk </TD><TD> Slovakian </TD></TR>
<TR><TD> uk </TD><TD> UK </TD></TR>
<TR><TD> us </TD><TD> USA </TD></TR>
</TABLE>
If more than one language/country-specific knowledgebase files are in the Engine Binary directory, the system
automatically uses the one according to the current language of the recognition.
If the User's product will be used only in a specific region, the installed languge/country-specific knowledge base
files can be removed and replaced manually by the one according to that region. In this case, some accuracy improvement
may be accessible. For example, in Austria it is possibly better to use \c hand_s_at.rec instead of \c hand_s_de.rec.
For a language spoken in more than one regions, there is no point using simultaneously all the knowledge base files
containing that language, because the Engine cannot decide well enough between the regions. Since the API does not
provide any way to specify the country, the User has to make the decision in advance.
The module requires at least one \c .rec file in the Engine Binary directory. It is not necessary to be \c hand_s.rec. On the
other hand, the Distribution Wizard of the CSDK tries to copy only \c hand_s.rec from the binary folder into the
selected file set (and sends a message, if this file is not there). Thus if you want to see a different subset of
optional knowledge base files in your redistributed file set you should select and copy it manually after running the
Distribution Wizard.
@subsection RER_ACCURACY Accuracy issues
Handprint is much harder to recognize accurately than machine generated text, and success depends very heavily on
character quality. The use of structured forms to limit the possible range of characters, together with zone-level
filters and individual character validation can significantly improve accuracy (\ref kRecSetFilterPlusEx). This recognition module can apply
all the Engine’s possible filter elements to the 159-member character set it supports. Handprinted forms are usually
filled by different respondents and this is liable to lower accuracy. If respondents can be given clear filling
instructions (e.g. a print model to follow) and be motivated to print clearly, success will be higher.
If the handprint contains numbers only, using the \ref RM_HNR module is likely to give better results than the
\ref RM_RER module filtered for numbers only. The functioning of the RER module can be influenced by the page-level
trade-off setting.
@subsection RER_CONDI Conditions
For successful recognition, the characters should not touch each other. Each character can be zoned individually
or a zone may contain one or more lines of characters. Each character must have a height of 30-180 pixels. Well
formed characters written in pen are best recognized. Pencil and felt-tip pens give poorer results. When reading
from pre-printed forms, dropout colored boxes can be useful to encourage respondents to write characters of even
size and spacing. But then, they mustn’t use a pen with the dropout color.
Maximum number of characters in a line: 200.
Number of lines in a zone: No restriction.
@subsection RER_MODINTEG Module integration issues
The Engine cannot provide access to all the parameters of reRecognition’s KADMOS toolkit. Note however, that the
recognition module can be fine-tuned through parameters of an INI file located under the section <tt>[Parm]</tt>. A sample INI
file \c RM_RER.INI can be found in the above mentioned folder \c RER_KBFiles of the install ZIP. The full-path of the given INI file can be
specified by the setting \ref ALLSTS_KERNELAPI_RECOG_RER "Kernel.Ocr.RER.UseParamFile", which replaces the function |
RER recognition module can recognize only machine printed (FM_OMNIFONT) characters of Hebrew. Handprinted characters are not supported.
When Hebrew language (LANG_THA or LANG_HEB) is set for recognition, other languages, except for English, should not be set.
The module can recognize short English texts embedded in Hebrew text. It works by default even when English language is not enabled.
IMPORTANT NOTE: For the correct working of the recognition of Hebrew language, the language should be set before the preprocess operation.