General tab - Properties of Format Locator window

Format Locator icon The format locator works with format definitions such as pattern matching (regular expressions and simple expressions) and advanced algorithms (Levenshtein and trigrams). The format definitions in partnership with dictionaries and keywords are used to extract data from documents, without the need to define zones. The locator runs on a full or partial page read of the document to extract the data using searches that are specific to the data, not the document layout. The locator evaluates the found alternatives and the data output.

Use this tab to determine how the format locator is used in your project.

Advanced

This group has the following settings:

Use results from locator

Enable this setting to use the results from another locator. (Default: Cleared)

In order to use this setting your project must contain two or more locators and the input locator used by this Format Locator must appear above it in the project hierarchy. This is because locators are processed in a sequential order and the results from the input locator are needed before this locator can be processed.

If this setting is enabled, the Format Definitions tab is disabled.

Settings for Regular Expressions

This group has the following settings:

Use OCR substitution

Enable this setting if you want to correct commonly misread letters or numbers. (Default: Cleared)

This setting does not work when a dictionary is inserted in the same format locator.

Define OCR Substitution

Click this button to open the OCR Substitution window and define OCR substitutions.

Settings for Non-Regular Expressions

This group has the following settings:

Maximum search tolerance

This setting helps you to perform a search by allowing for preceding or trailing characters alongside the comparison string. (Default: 30%)

You can use the following equation to calculate the desired search tolerance:

Search Tolerance = 1.00 - Number of Characters (Search String) / Number of Characters (Comparison String)

A value of 0% requires an exact match to the search string and a value of 100% will match everything.

If you are searching a document for word such as "name," the search tolerance can be adjusted so words that contain name are also returned. In order to find words like "names" or "named", the tolerance here needs to allow an extra character or a Maximum search tolerance of 20%. To locate words such as "surname," "jobname," and "unnamed", a value of 43% is required. A value of 50% is required to find words such as "forename" and "username."

Maximum word count

This setting allows you to extract strings that consist of multiple words. (Default: 6)

For example, a phone number that is separated by a hyphen (-) such as 123-456-7890, consists of three words. To ensure that these types of matches are found, the value for this setting must be three or greater.

Maximum gap between words

This specifies the maximum distance in mm that permits word concatenation during a search. Words that fall outside this measurement are treated as separate words and not included in the current search. (Default: 5 mm)

Maximum alternative length

This specifies the maximum length that an alternative is allowed. An alternative that exceeds this amount is rejected. (Default: 100 mm)

The following buttons are available at the bottom of this window:

Button

Description

Close

Closes the window and saves your changes.

Test icon   Test

Tests the locator settings. The results are displayed on the Test Results tab that is displayed automatically when you click this button.

Depending on the locator method, this button may have additional modes if the locator uses other locators as input.

Help icon  Help

Displays the help for the open window.

Related topics: