Local Fuzzy Database Properties window
Use this window to select a locally stored import file for the database using the following options:
- Referenced import file (text or csv file)
-
Select one of the following reference file locations:
-
File system.
Browse to the desired location of a local fuzzy database. The import process starts automatically when the window is closed, and a message box is displayed that counts the number of imported database lines. One million lines with three fields take about 1 minute to import.
-
Web.
Type the URL for your local fuzzy database file.
Click Test to ensure that the connection to the specified URL is available.
Provide a User Name and Password if authentication is required.
-
- Column Configuration
-
This table has the following columns. You can rename an entry by clicking on the row and then clicking on a the cell in that row.
- Column name
-
The name of the database column.
- Search
-
If selected for a field, that field is included in database searches.
Select this value for each field that might be present on a document.
- Filter
-
It is possible to filter the data before searching to reduce the number of possible records. If this option is selected for a field it is included in the filtering index. Filtering is possible using scripting only.
This option is available when both of following options are selected in the Optimization group only.
-
Load database in memory.
-
The value for Database processing is set to Advanced.
-
- Import Options
-
This group has the following options:
- Ignore Case
-
Select this option to convert all search and lookup strings to lower case, effectively ignoring case.This option is selected by default.
- Filtering is case sensitive
-
Select this option to ensure that the filtering text entered in the script matches an entry in the database. If the cases do not match exactly, no records are returned. This option is selected by default.
This option is available when at least one field has the Filter option selected in the Column Configuration group and both of the following options are selected in the Optimization group only.
-
Load database in memory.
-
The value for Database processing is set to Advanced.
-
- First line contains caption
-
Select this option if the first record of the input file contains the column headers. This option is selected by default.
- Field delimiter
-
Type values into this field to specify what characters separate the import file content into individual fields. The value for this option is set to ; (semicolon) by default.
- Tab
-
Select this checkbox to use a Tab as a delimiter in addition to the characters specified in the Field delimiter setting.
- Word separation characters
-
If fields in the database contain compound words, common characters can be specified so that each part of the compound word is searched and evaluated separately.
The value for this option is set to -, (space, hyphen, comma) by default.
For example, using the default settings, the compound word
Diagon-Alley,
is treated as two words,diagon
andalley
that are searched and evaluated separately.Note The separation characters must correspond to the delimiter characters that are defined for OCR.- Tab
-
Select this checkbox if you want to use a Tab as a word separation character in addition to the characters specified in the Word separation characters setting.
- Space
-
Select this checkbox if you want to use a Space as a word separation character in addition to the characters specified in the Word separation characters setting.
- Characters to ignore
-
Type a list of characters into this field to filter unwanted characters from the input record. When you want to use a field delimiter that may also be a character in the input, such as a comma (,), then you have to use quotes (") to identify the input strings. However, you probably do not want to retain those quotation marks as part of the final results.
If you define the quotes as characters to ignore, they are removed. To define a tab or space as characters to ignore, select the corresponding check box.The value for this option is set to ."'! (period, quotation mark, single quotation mark, and an exclamation point) by default.
- Space
-
Select this checkbox if you want to ignore a Space character in addition to the characters specified in the Characters to ignore setting.
- Optimization
-
This group has the following options:
- Automatic update from import file
-
Select this option to update the Local Fuzzy Database automatically when the source file is updated. This option is cleared by default.
- Load database in memory
-
Select this option to load the database into memory. This option is selected by default.
If you clear this option, the Filter option is disabled in the Column Configuration group.
- Database processing
- Select one of the options to determine the
level of processing required for your database searches. Choose from:
-
Basic.
Select this option if the computer where Kofax Capture and Kofax Transformation Toolkit is installed does not have a lot of memory or processing power. Selecting this value means that the accuracy of your results can be lower than expected, but the time it takes to generate these results may be significantly faster than if you selected the Advanced value for this option.
If you select the Basic value for this option, the Filter option is disabled in the Column Configuration group.
This option replicates the search behavior from Kofax Transformation Toolkit 5.0 that enables users to focus on speed instead of accuracy.
-
Advanced. This is the default value for this option.
Select this option if you want the most accurate search results. Selecting this option means that the accuracy of your results are better than the Basic value for this option. However, the time it takes to generate these results depends on the size and complexity of your database, the available memory, and the number of processing cores available on your server. If you do not have a lot of memory and multiple processing cores, or your database is very large and complex, the Basic value may be more suitable.
If you select this value, you can further optimize a database locator that uses this database for speed or accuracy.
Tip If you are unsure which value is best for your project, the best practice is to finish configuring the database locator that uses this fuzzy database and then run several extraction benchmarks to compare the results between the two Database processing values. -
- String Substitution
-
This group has the following options:
The string substitution table substitutes Search Text with Replacement Text in the document and in the database. It is used to normalize the results of the text search.
If you use the same dictionary in more than one project, you can create a list of string substitutions and export them to use in the other project.
You can use the following buttons to manage string substitutions:
- Table Records Preview
-
This table provides a preview of the selected database.
Definitions for the buttons at the bottom of this window can be found in Common Project Builder Buttons.