Language, Character Set and Code page Handling Module related pages

OmniPage Capture SDK can recognize over 500 different characters, enough for 119 selectable languages. The fullest support is provided through the MOR multi-lingual recognition module; other modules provide varying levels of support. The information in this section relates only to the text-handling modules in the Engine, i.e. MOR, MTX, FRX, PLUS2W and PLUS3W, DOT, and RER.

Choose from the following topics:

Code Pages in the Engine
This lists the Code Pages supported by the Engine, available for exporting recognized text. It provides additional information on each Code Page.

Character Set in the Engine
This details how the character set for a given page or zone can be compiled, using language choice, individual character validation and filtering.

Language identifiers
This lists the languages supported by the Engine and their identifiers according to the main standards and use cases.

Non-Asian Languages and characters
This lists the 119 non-Asian languages supported by the Engine in alphabetical order. It shows which accented letters are enabled for each language, giving their UNICODE values.

Languages, geography, Code Pages
This lists the languages alphabetically, stating where they are spoken and by approximately how many people. It shows which of the commonly used Code Pages support each language. Information is provided for Windows Code Pages 1250, 1252, 1257 and DOS Code Pages 437, 850 and 852.

Languages and modules
This lists the languages alphabetically, showing which are supported by each of the text-handling recognition modules.

Characters, languages, modules
This displays and names all accented letters alphabetically, stating which languages use each character, which modules support each of them and also their code values in UNICODE, Code Page 1252 and Code Page 437.

Characters (punctuation/miscellaneous) and modules
This lists the punctuation and miscellaneous characters, showing the module support for each of them.

Supported characters with the CCJK languages This gives information about the character sets recognized by the Asian Recognition Module for CCJK languages.

Characters and Code Pages
This lists all characters the Engine supports, (Latin, Greek and Cyrillic alphabets, digits, punctuation and miscellaneous characters – about 500 in total) giving their values in a range of Code Pages, including UNICODE.

OCR special filling methods
Description of OCR special filling methods, which contains OCR special characters besides some generally used characters.

CCJK code page description file
This page describes the format of the code page description files of CCJK languages.