Recognition Profiles Window - High Performance Zonal Engine
Use this window to adjust various settings for the Kofax High Performance ICR Zonal, Kofax High Performance OCR Zonal, and Kofax ICR Zonal profiles.
Although finding the best settings is a trial and error process, correct tuning can dramatically increase recognition accuracy and speed.
Name
Use the list to select a recognition profile. The other settings on the window are refreshed with the settings defined for the selected profile.
Engine
High Performance Zonal is the default setting.
Language
Select one of the following target languages:
-
Azeri
-
Brazilian
-
Bulgarian
-
Central European
-
Croatian
-
Czech/Slovakian
-
Danish
-
Dutch
-
Dutch – Belgian
-
English – Canadian
-
English – Irish
-
English – United Kingdom
-
English – United States
-
Estonian
-
Finnish
-
French
-
French – Belgian
-
French – Canadian
-
French – Switzerland
-
German
-
German – Austrian
-
German – Switzerland
-
Greek
-
Hungarian
-
Italian
-
Italian – Switzerland
-
Lithuanian
-
Malay
-
Norwegian
-
Polish
-
Portuguese
-
Russian
-
Scandinavian
-
Slovenian
-
South American
-
Spanish
-
Swedish
-
Turkish
-
Western European
Print Type
Use these settings to specify the type of print that you expect to find in your zone. If Kofax Capture knows the type of print in advance, recognition accuracy is greatly improved.
- Unknown
-
Select this option if you do not know, or cannot predict, the type of print to appear in the zone. In this case, Kofax Capture makes its best guess. You should also select this option if your zone contains more than one print type.
- Handprint
-
Select this option for well-formed, hand-printed characters. This is the default setting for ICR recognition profiles.
- Farrington 7B
-
Select this option if the Farrington 7B font appears in the zone. Farrington 7B is often used for embossing text on credit and identification cards.
Note You may get better results with this font if you try a manual pitch setting instead of fixed or variable pitch. - Machine print
-
Select this option for machine-printed characters other than one of the specific fonts listed here. This is the default value for OCR recognition profiles.
- OCR-A
-
Select this option if the OCR-A font appears in the zone. OCR fonts are frequently used when standard character shapes are required to scan numbers and for text without bar codes. Examples of OCR-A usage include bank checks, passports, labels, and postal mail.
- CMC7
-
Select this option if the CMC7 font appears in the zone. The CMC-7 font is a special Magnetic Ink Character Recognition (MICR) font used in France and most Spanish-speaking countries (for example, Spain and Mexico) to print characters for magnetic ink and optical character recognition systems. This font contains the characters 0 through 9, and five special symbols.
Note You may get better results with this font if you try a manual pitch setting instead of fixed or variable pitch. - Fixed
-
Select this option for fixed pitch machine fonts.
- OCR-B
-
Select this option if the OCR-B font appears in the zone. OCR-B is used for the same purposes as OCR-A, but the character shapes differ slightly.
- E13B
-
Select this option if the E13B font appears in the zone. The E13B font is used on bank checks and drafts in the United States, Canada, Puerto Rico, Panama, the United Kingdom, and a few other countries for magnetic ink and optical character recognition systems. This font contains the characters 0 through 9, and four special symbols: transit, amount, on-us, and dash.
Note You may get better results with this font if you try a manual pitch setting instead of fixed or variable pitch.
Content
Use these options to specify the nature of the content expected in your zone. Depending on the target language and the print type you select, some options may be disabled.
- Alphanumeric
-
Use this option to allow for the full range of characters in the selected language.
- Numeric
-
Use this option for numerals and a few special characters.
- Amount
-
Use this option for arithmetic amounts.
If you select this option, Kofax Capture interprets the amount as a combination of leading fill characters, a major amount, a separator, a minor amount, and trailing fill characters. This interpretation is internal, and only affects how words are checked in your dictionary. The final result is still output to a single index field; however, if any one element fails the dictionary check, the entire field is flagged.
Character set
Use this item to specify a mask for the recognition text. This mask specifies the allowable characters and formatting for the index field. You can either select one of the default masks or create your own custom mask for the field.
Use this item to specify a character set and mask for the recognition text. The character set and mask assist the recognition engine in finding the correct characters for recognition by specifying the allowable character types and formatting for the index field. For example, select the 0-9 character set for an index field that contains only numbers or the a-z character set for an index field that contains lowercase letters from a-z. You can either select one of the default masks or create your own custom mask for the field.
If you create your own mask, it is not added to the drop-down list. This means that if you want to use the same custom mask in more than one zone, you should copy it to the clipboard (or a text file).
Mask patterns can be used in any sequence and have the following syntax.
Pattern |
Meaning |
---|---|
A-Z |
Allows any uppercase letter from A-Z |
a-z |
Allows any lowercase letter from a-z |
0-9 |
Allows any number from 0-9 |
{n} |
Digits inside brackets indicate the number of allowed characters of the preceding type. This structure must appear at the end of the expression. The following example indicates that the result is three digits: 0-9{3} |
\ |
Escape character used to specify that the result may contain characters that are normally part of the mask syntax (e.g.-{ }). For example, the hyphen (-) is used in the mask syntax (as in A-Z). To allow a hyphen in the zone results, you need to include \- (escape hyphen) in the mask. To allow a back slash, include \\ in the mask. |
literal |
Any other characters in the mask are treated as literal characters in the specified position relative to other elements in the mask. |
Trigram Mode
Trigrams are combinations of three letters that are commonly found in many languages. A common English trigram is
ing.
Kofax Capture
can take advantage of trigrams to enhance recognition accuracy.
Consider these examples:

In the first case, the image file for
Walking
suffers from drop-outs. In particular the
n
is badly faded. The recognition engine cannot decide if it is an
r
followed by an
i,
or if it is a single
n,
so the character is marked as
rejected
in the initial results. Trigram analysis is applied to the initial result and the recognition engine
decides that the most likely combination of three letters, in this case, is
ing.
In the second case, the image file contains substantial noise. Because of this noise, the second
i
in
Dictionary
is interpreted as the letter
l.
Trigram analysis shows that "ion" is more likely than
lon
and the word is corrected.
It is important to keep in mind that trigram analysis is a statistical process. Kofax Capture ships with trigram tables for most supported languages. Each table contains a list of possible three-letter combinations and their frequency of occurrence in that language. Although there are thousands of such combinations, many of them are almost never used, so their frequency of occurrence is near zero. Trigrams attempt to identify (and optionally correct) combinations of letters that have both a low confidence rating and a low frequency of occurrence.
There may be rare occasions where your data contains many uncommon trigrams. For example, a list of Chicago radio stations might include WGN, WLS, WNVR, WKTAF, WZRD, WBEZ, or WXRT. In such cases, if you notice problems, you should consider turning off trigrams for your recognition profile.
- Off
-
This option turns off the trigram feature.
- Check
-
This option enables the trigram feature, but limits its behavior to marking suspected characters.
- Repair
-
This option enables the trigram feature so that Kofax Capture can repair questionable combinations by picking the nearest and most likely trigram. For example,
iog
might be replaced bying.
Logical context
Use this setting to select logical context. When selected, the recognition engine attempts to resolve uncertain characters
on the basis of their context. This is useful when distinguishing between characters such as 1 (one) and l (the letter
l
). For example, assume that the recognition engine extracts
te1l
from a zone. Logical context resolves this to
tell
because it is less likely that the 1(one) would be embedded in letters.
Do not use logical context if the zone has
words
that include a combination of alphabetic and numeric characters, such as
Abc123def.
Advanced button
This button displays a window with a number of advanced options.
Image Cleanup
Select an image cleanup profile from the drop-down list.
Edit button
To modify an existing image cleanup profile or create a new one, click the Edit button. The Image Cleanup Profiles window appears, and you can specify the type of image cleanup to use.
Delete button
Click this button to delete the currently selected profile. It is not possible to delete profiles that are built in to Kofax Capture.
Script button
If enabled, use this button to assign a recognition script to the selected profile. The Recognition Script window appears, and you can associate a recognition script with the recognition profile
Test button
Click this button to test your zone settings. Your recognition profile and cleanup settings are applied to the zone with the results displayed in the Zone Test window.