RecAPI
All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Properties Modules Pages
Arabic recognition module
Module name: ARABIC
Module identifier: RM_ARABIC
Filling methods supported: FM_OMNIFONT
Filters supported: should not be used
Trade-off supported: see below
Training file supported: no

This module requires the Recognition Add-on. See the topic on Licensing in the General Information help system.

Application areas

This module provides recognition services for Arabic languages. It can handle short embedded English texts within the Arabic text. Formerly the Arabic OCR was performed by the Asian recognition module. From version 22.0 OmniPage CSDK introduced the Arabic recognition module.

Digital camera input can be used for Asian-language input, but the automatic 3D deskewing is not useful in these cases. Manual 3D deskew is available via the Scanner Enhancement Technology Tools (SET Tools), which is a part of RecAPIPlus and the Image View Visual Control (for more information see the IPRO and Visual Toolbox documentation systems).

Table zones can be inserted onto Arabic pages, but if the OCR cannot detect a table within such a zone, the zone is likely to produce zero recognition results.

Language handling

The Arabic language handling differs somewhat from that for Western languages. Spell checking, editor display and verification are not available for Arabic languages. In addition other languages should not be set alongside the Arabic language, except for English.

Arabic OCR Engine can recognize short English texts embedded in any Arabic text. It works in default even when English language is not enabled. (Formerly, in the case of Arabic OCR, English language was required to be set for embedded English text recognition, but from version 22.0 it is not necessary.)

IMPORTANT NOTE: For the correct working of the Arabic Recognition Module, the language should be set before the preprocess operation.

NOTE: Of course the above mentioned language selection manner does not refer to Single Language Detection.

Conditions

When user zones are used, it is recommended to create homogeneous user zones as much as possible (in the sense of font size), because they may give better results. It is especially important in the case of Asian languages. WT_AUTO zones can be inhomogeneous.

Deskew and orientation

The deskew and orientation detection work in a different way for Arabic language than in the case of other languages. The working of both operations can be adjusted through settings (Kernel.Img.Deskew.EnabledForArabic and Kernel.Img.Rotation.EnabledForArabic). If these settings are FALSE (by default), the AUTO methods (DSK_AUTO, ROT_AUTO) of these operations for Arabic language equal to the case when they are switched off (DSK_NO, ROT_NO). If the settings are TRUE, or the deskew and orientation are not set to AUTO, the working of these methods are the same for both the Arabic and the Western cases.

Trade-off

If trade-off is TO_ACCURATE or TO_BALANCED and decomp method is DCM_STANDARD, a more accurate embedded English text recognition is run by this module. (This is the default state.)

Character attributes

The character attributes, such as bold and italic styling, cannot be retrieved for Arabic text, nor for embedded English text.

See also
The settings of the Arabic Recognition Engine Module