General tab - Properties of Format Locator window
The format locator works with format definitions such as pattern matching (regular
expressions and
simple expressions) and
advanced algorithms (Levenshtein and
trigrams). The
format definitions in partnership with
dictionaries and
keywords are used to extract data from documents, without the need to define zones. The locator
runs on a full or partial page read of the document to extract the data using searches that are specific to the data, not the
document layout. The locator evaluates the found alternatives and the data output.
Use this tab to determine how the format locator is used in your project.
- Advanced
-
This group has the following settings:
- Use results from locator
-
Enable this setting to use the results from another locator. (Default: Cleared)
In order to use this setting your project must contain two or more locators and the input locator used by this Format Locator must appear above it in the project hierarchy. This is because locators are processed in a sequential order and the results from the input locator are needed before this locator can be processed.
If this setting is enabled, the Format Definitions tab is disabled.
- Settings for Regular Expressions
-
This group has the following settings:
- Use OCR substitution
-
Enable this setting if you want to correct commonly misread letters or numbers. (Default: Cleared)
This setting does not work when a dictionary is inserted in the same format locator.
- Define OCR Substitution
-
Click this button to open the OCR Substitution window and define OCR substitutions.
- Settings for Non-Regular Expressions
-
This group has the following settings:
- Maximum search tolerance
-
This setting helps you to perform a search by allowing for preceding or trailing characters alongside the comparison string. (Default: 30%)
You can use the following equation to calculate the desired search tolerance:
Search Tolerance = 1.00 - Number of Characters (Search String) / Number of Characters (Comparison String)
A value of 0% requires an exact match to the search string and a value of 100% will match everything.
If you are searching a document for word such as "name," the search tolerance can be adjusted so words that contain name are also returned. In order to find words like "names" or "named", the tolerance here needs to allow an extra character or a Maximum search tolerance of 20%. To locate words such as "surname," "jobname," and "unnamed", a value of 43% is required. A value of 50% is required to find words such as "forename" and "username."
- Maximum word count
-
This setting allows you to extract strings that consist of multiple words. (Default: 6)
For example, a phone number that is separated by a hyphen (-) such as 123-456-7890, consists of three words. To ensure that these types of matches are found, the value for this setting must be three or greater.
- Maximum gap between words
-
This specifies the maximum distance in mm that permits word concatenation during a search. Words that fall outside this measurement are treated as separate words and not included in the current search. (Default: 5 mm)
- Maximum alternative length
-
This specifies the maximum length that an alternative is allowed. An alternative that exceeds this amount is rejected. (Default: 100 mm)
The following buttons are available at the bottom of this window:
Button |
Description |
---|---|
Close |
Closes the window and saves your changes. |
![]() |
Tests the locator settings. The results are displayed on the Test Results tab that is displayed automatically when you click this button. Depending on the locator method, this button may have additional modes if the locator uses other locators as input. |
![]() |
Displays the help for the open window. |
Related topics:
-
More information on configuring a Format Locator