OCR Settings tab - RecoStar Page Recognition profile
Use this window to set the properties for RecoStar full text OCR profile.
The Countries and Languages group has the following tabbed options:
- Countries
-
Select one or more countries from the list to determine which country and language characters are supported by this recognition profile. The selected countries also correspond to the internal trigram mode and any existing dictionaries.
The default country option is determined by the Kofax Transformation Modules user interface language when the project is first created. For example, if the interface is set to "German" when the project is created, the default country option is "Germany." If you change the application language after the project is created, the default country is not affected.
- Languages
-
Use these additional recognition options in a multi-lingual setting only when the text in your documents corresponds to the print type "Machine print" and the content type "Alphanumeric".
Note The (optional) number in brackets helps you monitoring the number of selected countries and languages. That is especially important if your selection is hidden and you need to scroll to see the selection.
The General Settings group can help improve recognition results with the following options:
- Image PreProcessing
-
Select one of the predefined image processing definitions. The predefined definitions include several combinations of items to be removed before OCR is performed. By default, the "Remove Shading, Dots, Lines, Punch holes" item is selected.
- Word separation characters
-
Use this field to define the characters that may separate words. For example, a hyphen (-).
- Correct split numbers
-
Select this option to automatically combine numbers or numeric words that are close together but recognized as separate words. For example, if the engine reads "12" and "00," as two words less than half a space apart, this option results in a single combined word or "12,00".
The Print Type group helps improve recognition results by selecting which print type is expected. Choose between Unknown, Fixed, Hand print, and Machine print. This option is set to Unknown by default.
The Trigram Mode group has the following options:
- Trigram Mode
-
Select one of the trigram modes. Choose from Off, Check, or Repair. This option is set to Off by default.
- Logical context
-
Select this if you want to enable the trigram feature to resolve uncertain characters on the basis of their logical context.
The Dictionary group enables you to select a dictionary to help with recognition. Sample dictionaries are provided by Kofax Transformation Modules. Additionally, you can create your own dictionaries. By default, no dictionary is selected. When you click the button to the right, you can browse to the directory, where the dictionaries are stored. Click <None> to remove the dictionary from your recognition profile.