Recognition Profiles Window - High Performance Zonal Engine

Name

Use the list to select a recognition profile. The other settings on the window are refreshed with the settings defined for the selected profile.

Engine

High Performance Zonal is the default setting.

Language

Select one of the following target languages:

Azeri
Brazilian
Bulgarian
Central European
Croatian
Czech/Slovakian
Danish
Dutch
Dutch – Belgian
English – Canadian
English – Irish
English – United Kingdom
English – United States
Estonian
Finnish
French
French – Belgian
French – Canadian
French – Switzerland
German
German – Austrian
German – Switzerland
Greek
Hungarian
Italian
Italian – Switzerland
Lithuanian
Malay
Norwegian
Polish
Portuguese
Russian
Scandinavian
Slovenian
South American
Spanish
Swedish
Turkish
Western European

Print Type

Use these settings to specify the type of print that you expect to find in your zone. If Kofax Capture knows the type of print in advance, recognition accuracy is greatly improved.

Unknown: Select this option if you do not know, or cannot predict, the type of print to appear in the zone. In this case, Kofax Capture makes its best guess. You should also select this option if your zone contains more than one print type.
Handprint: Select this option for well-formed, hand-printed characters. This is the default setting for ICR recognition profiles.
Farrington 7B: Select this option if the Farrington 7B font appears in the zone. Farrington 7B is often used for embossing text on credit and identification cards.

Note You may get better results with this font if you try a manual pitch setting instead of fixed or variable pitch.
Machine print: Select this option for machine-printed characters other than one of the specific fonts listed here. This is the default value for OCR recognition profiles.
OCR-A: Select this option if the OCR-A font appears in the zone. OCR fonts are frequently used when standard character shapes are required to scan numbers and for text without bar codes. Examples of OCR-A usage include bank checks, passports, labels, and postal mail.
CMC7: Select this option if the CMC7 font appears in the zone. The CMC-7 font is a special Magnetic Ink Character Recognition (MICR) font used in France and most Spanish-speaking countries (for example, Spain and Mexico) to print characters for magnetic ink and optical character recognition systems. This font contains the characters 0 through 9, and five special symbols.

Note You may get better results with this font if you try a manual pitch setting instead of fixed or variable pitch.
Fixed: Select this option for fixed pitch machine fonts.
OCR-B: Select this option if the OCR-B font appears in the zone. OCR-B is used for the same purposes as OCR-A, but the character shapes differ slightly.
E13B: Select this option if the E13B font appears in the zone. The E13B font is used on bank checks and drafts in the United States, Canada, Puerto Rico, Panama, the United Kingdom, and a few other countries for magnetic ink and optical character recognition systems. This font contains the characters 0 through 9, and four special symbols: transit, amount, on-us, and dash.

Note You may get better results with this font if you try a manual pitch setting instead of fixed or variable pitch.

Content

Use these options to specify the nature of the content expected in your zone. Depending on the target language and the print type you select, some options may be disabled.

Alphanumeric

Use this option to allow for the full range of characters in the selected language.

Numeric

Use this option for numerals and a few special characters.

Amount

Use this option for arithmetic amounts.

If you select this option, Kofax Capture interprets the amount as a combination of leading fill characters, a major amount, a separator, a minor amount, and trailing fill characters. This interpretation is internal, and only affects how words are checked in your dictionary. The final result is still output to a single index field; however, if any one element fails the dictionary check, the entire field is flagged.

Character set

Use this item to specify a mask for the recognition text. This mask specifies the allowable characters and formatting for the index field. You can either select one of the default masks or create your own custom mask for the field.

Use this item to specify a character set and mask for the recognition text. The character set and mask assist the recognition engine in finding the correct characters for recognition by specifying the allowable character types and formatting for the index field. For example, select the 0-9 character set for an index field that contains only numbers or the a-z character set for an index field that contains lowercase letters from a-z. You can either select one of the default masks or create your own custom mask for the field.

Note Your character set must be appropriate for the type of information that you expect to find in the field. For example, if you select the character set 0-9 for an index field that contains only upper or lowercase letters, the results placed in the index field may not be correct.

If you create your own mask, it is not added to the drop-down list. This means that if you want to use the same custom mask in more than one zone, you should copy it to the clipboard (or a text file).

Mask patterns can be used in any sequence and have the following syntax.

Pattern	Meaning
A-Z	Allows any uppercase letter from A-Z
a-z	Allows any lowercase letter from a-z
0-9	Allows any number from 0-9
{n}	Digits inside brackets indicate the number of allowed characters of the preceding type. This structure must appear at the end of the expression. The following example indicates that the result is three digits: 0-9{3}
\	Escape character used to specify that the result may contain characters that are normally part of the mask syntax (e.g.-{ }). For example, the hyphen (-) is used in the mask syntax (as in A-Z). To allow a hyphen in the zone results, you need to include \- (escape hyphen) in the mask. To allow a back slash, include \\ in the mask.
literal	Any other characters in the mask are treated as literal characters in the specified position relative to other elements in the mask.

Trigram Mode

Trigrams are combinations of three letters that are commonly found in many languages. A common English trigram is ing. Kofax Capture can take advantage of trigrams to enhance recognition accuracy.

Consider these examples:

In the first case, the image file for Walking suffers from drop-outs. In particular the n is badly faded. The recognition engine cannot decide if it is an r followed by an i, or if it is a single n, so the character is marked as rejected in the initial results. Trigram analysis is applied to the initial result and the recognition engine decides that the most likely combination of three letters, in this case, is ing.

In the second case, the image file contains substantial noise. Because of this noise, the second i in Dictionary is interpreted as the letter l. Trigram analysis shows that "ion" is more likely than lon and the word is corrected.

It is important to keep in mind that trigram analysis is a statistical process. Kofax Capture ships with trigram tables for most supported languages. Each table contains a list of possible three-letter combinations and their frequency of occurrence in that language. Although there are thousands of such combinations, many of them are almost never used, so their frequency of occurrence is near zero. Trigrams attempt to identify (and optionally correct) combinations of letters that have both a low confidence rating and a low frequency of occurrence.

There may be rare occasions where your data contains many uncommon trigrams. For example, a list of Chicago radio stations might include WGN, WLS, WNVR, WKTAF, WZRD, WBEZ, or WXRT. In such cases, if you notice problems, you should consider turning off trigrams for your recognition profile.

Off: This option turns off the trigram feature.
Check: This option enables the trigram feature, but limits its behavior to marking suspected characters.
Repair: This option enables the trigram feature so that Kofax Capture can repair questionable combinations by picking the nearest and most likely trigram. For example, iog might be replaced by ing.

Logical context

Use this setting to select logical context. When selected, the recognition engine attempts to resolve uncertain characters on the basis of their context. This is useful when distinguishing between characters such as 1 (one) and l (the letter l). For example, assume that the recognition engine extracts te1l from a zone. Logical context resolves this to tell because it is less likely that the 1(one) would be embedded in letters.

Do not use logical context if the zone has words that include a combination of alphabetic and numeric characters, such as Abc123def.

Advanced button

This button displays a window with a number of advanced options.

Image Cleanup

Select an image cleanup profile from the drop-down list.

Edit button

To modify an existing image cleanup profile or create a new one, click the Edit button. The Image Cleanup Profiles window appears, and you can specify the type of image cleanup to use.

Delete button

Click this button to delete the currently selected profile. It is not possible to delete profiles that are built in to Kofax Capture.

Script button

If enabled, use this button to assign a recognition script to the selected profile. The Recognition Script window appears, and you can associate a recognition script with the recognition profile

Test button

Click this button to test your zone settings. Your recognition profile and cleanup settings are applied to the zone with the results displayed in the Zone Test window.

Note The Test button is disabled if the sample page is displayed from the Document class tree view. The sample must be displayed from the Batch class tree view tab if you want to use QuickZones to verify your recognition and cleanup settings.

Your search for returned results.

Search tips