Fuzzy DB Extraction
This topic provides an overview of the FuzzyDB plugin and its configuration in Transact.
The FuzzyDB plugin uses a fuzzy database lookup to link your internal database to Transact. This allows you to match the data extracted from Transact with data from your internal database.
For example, refer to the following excerpt from a sample invoice.
From this invoice, Transact can extract the vendor name "Office Depot". However, from this document the system cannot determine your internal vendor number that you have assigned to this vendor. To match this internal information (vendor number) with the extracted information (invoice), the system can learn a database table for the list of vendors to perform automated look-ups. It uses the fuzzy matching logic on the extracted information.
Prerequisites
FuzzyDB must be added to the Extraction module and set to ON.
Configure Fuzzy DB Extraction
Follow the steps in these topics to configure Fuzzy DB Extraction:
Connection Manager
Before configuring the FuzzyDB plugin, you must first connect your database to Transact.
- Go to .
- Open Connection Manager and click Add.
-
Provide the required information.
The following connection (database) types are available:
-
MYSQL
-
MSSQL
-
MSSQL Windows Authentication
-
Oracle
-
MariaDB
-
- Click Test Connection to ensure the connection is successful and then click Save.
Fuzzy DB Extraction configuration
Once you set up your connection in Connection Manager, you can select this connection when adding Fuzzy DB Extraction to a document type.
- Select and open a batch class.
- Open a document type.
-
Go to
.
The Document Fuzzy page appears by default.
The following options are available for mapping database data to a document:
-
Document Fuzzy
-
Field Fuzzy
Document Fuzzy
In Document Fuzzy, only one database can be configured per document type.
-
On the
Document Fuzzy page, provide the following details from the corresponding drop-down lists:
-
Connection
-
Table Name
-
Primary Key
-
-
Click
Add to map index fields with database columns.
Select the Is Searchable check box if you want to limit the search to the specified table column, not the entire database.
-
Review the
Additional Parameters Mapping section and perform any additional configurations.
Enabled
Select this check box to enable basic configuration.
Confidence Threshold
A value from 1-100. The minimum confidence defined for the search.
Weight
Value from 0-1. Acts as a multiplying factor for the computed confidence.
Ignore Word List
Enter values separated by a semi-colon ( ; ) which you want to exclude during search.
Include Pages
Define the pages in a document type in which you want to search.
- FIRSTPAGE
- LASTPAGE
- ALLPAGES
Max Search Results
Define the maximum number of results that will be returned for the search.
HOCR Search Switch
Select this check box to enable searching based on HOCR content after index fields.
Search Column List
Select the index field to search with.
- Click Apply to save any changes.
- Click Learn DB to generate the lucene indexes.
Field Fuzzy
In Field Fuzzy, a field can be mapped with multiple databases. This is done through groups.
- Under Fuzzy DB Extraction Configuration, click Field Fuzzy.
-
On the Field Fuzzy page, provide the following information:
-
Group Name
-
Connection
-
Table Name
-
Primary/Unique Key
-
-
Click
Add to map index fields with database columns.
Select the Is Searchable check box if you want to limit the search to the specified table column, not the entire database.
-
Review the Additional Parameters Mapping section and perform any additional configurations.
Enabled
Select this check box to enable basic configuration.
Confidence Threshold
A value from 1-100. The minimum confidence defined for the search.
Weight
Value from 0-1. Acts as a multiplying factor for the computed confidence.
Ignore Word List
Enter values separated by a semi-colon ( ; ) which you want to exclude during search.
Include Pages
Define the pages in a document type in which you want to search.
- FIRSTPAGE
- LASTPAGE
- ALLPAGES
Max Search Results
Define the maximum number of results that will be returned for the search.
- Click Apply to save any changes.
- Click Learn DB to generate the lucene indexes.
Test Extraction
You can test the extraction results from the Document Types folder. Supported image files are PDF or TIFF.
- Select the document type and click Test Extraction.
-
Click
Extract.
The Extraction Type column shows how the data was extracted. If a FuzzyDB mapping matches the mapped fields, it will list FuzzyDB as the extraction type.
Perform Fuzzy Search and Extraction
Once you execute a batch, the document appears on the Validation screen.
-
In the
Fuzzy Search field, enter a mapped index field (see
Document Fuzzy or
Field Fuzzy).
A pop-up window displays the fuzzy search results.
-
From the table, select the record that you want to fetch.
You can enter an asterisk ( * ) into the search bar to fetch all records from the table.
You cannot select two rows from the same group.
Multiple Rows:
-
If multiple rows are selected, the option with the highest confidence score is populated in the field.
-
If multiple rows are selected that have the same confidence, the option with the higher weight is populated.
-
If multiple rows are selected that have the same confidence and weight, the first record on the extraction list is populated.
Multiple Fuzzy Extraction Methods:
-
If both Document Fuzzy and Field Fuzzy are configured and the same index field exists in both, the option with the higher weighted confidence (confidence x weight) is populated in the field.