Fuzzy DB Extraction

This topic provides an overview of the FuzzyDB plugin and its configuration in Transact.

The FuzzyDB plugin uses a fuzzy database lookup to link your internal database to Transact. This allows you to match the data extracted from Transact with data from your internal database.

For example, refer to the following excerpt from a sample invoice.

Fuzzy DB Extraction sample invoice

From this invoice, Transact can extract the vendor name "Office Depot". However, from this document the system cannot determine your internal vendor number that you have assigned to this vendor. To match this internal information (vendor number) with the extracted information (invoice), the system can learn a database table for the list of vendors to perform automated look-ups. It uses the fuzzy matching logic on the extracted information.

FuzzyDB can be used for extraction or validation.

Prerequisites

FuzzyDB must be added to the Extraction module and set to ON.

Configure Fuzzy DB Extraction

Follow the steps in these topics to configure Fuzzy DB Extraction:

Connection Manager

Before configuring the FuzzyDB plugin, you must first connect your database to Transact.

  1. Go to Administrator > System Configuration.
  2. Open Connection Manager and click Add.
  3. Provide the required information.

    Connection Manager details

    The following connection (database) types are available:

    • MYSQL

    • MSSQL

    • MSSQL Windows Authentication

    • Oracle

    • MariaDB

  4. Click Test Connection to ensure the connection is successful and then click Save.

Fuzzy DB Extraction configuration

Once you set up your connection in Connection Manager, you can select this connection when adding Fuzzy DB Extraction to a document type.

This is a Document Level Field (DLF), which means it is configured at the document type level in a batch class.

  1. Select and open a batch class.
  2. Open a document type.
  3. Go to Index Fields > Fuzzy DB Extraction Configuration.

    The Document Fuzzy page appears by default.

The following options are available for mapping database data to a document:

  • Document Fuzzy

  • Field Fuzzy

Document Fuzzy

In Document Fuzzy, only one database can be configured per document type.

Document Fuzzy

  1. On the Document Fuzzy page, provide the following details from the corresponding drop-down lists:
    • Connection

    • Table Name

    • Primary Key

  2. Click Add to map index fields with database columns.

    Select the Is Searchable check box if you want to limit the search to the specified table column, not the entire database.

  3. Review the Additional Parameters Mapping section and perform any additional configurations.

    Enabled

    Select this check box to enable basic configuration.

    Confidence Threshold

    A value from 1-100. The minimum confidence defined for the search.

    Weight

    Value from 0-1. Acts as a multiplying factor for the computed confidence.

    Ignore Word List

    Enter values separated by a semi-colon ( ; ) which you want to exclude during search.

    Include Pages

    Define the pages in a document type in which you want to search.

    • FIRSTPAGE
    • LASTPAGE
    • ALLPAGES

    Max Search Results

    Define the maximum number of results that will be returned for the search.

    HOCR Search Switch

    Select this check box to enable searching based on HOCR content after index fields.

    Search Column List

    Select the index field to search with.

  4. Click Apply to save any changes.
  5. Click Learn DB to generate the lucene indexes.

Field Fuzzy

In Field Fuzzy, a field can be mapped with multiple databases. This is done through groups.

While one field can be mapped to multiple databases, you cannot have more than one field mapped to the same table and index.

  1. Under Fuzzy DB Extraction Configuration, click Field Fuzzy.
  2. On the Field Fuzzy page, provide the following information:
    • Group Name

    • Connection

    • Table Name

    • Primary/Unique Key

  3. Click Add to map index fields with database columns.

    Select the Is Searchable check box if you want to limit the search to the specified table column, not the entire database.

  4. Review the Additional Parameters Mapping section and perform any additional configurations.

    Enabled

    Select this check box to enable basic configuration.

    Confidence Threshold

    A value from 1-100. The minimum confidence defined for the search.

    Weight

    Value from 0-1. Acts as a multiplying factor for the computed confidence.

    Ignore Word List

    Enter values separated by a semi-colon ( ; ) which you want to exclude during search.

    Include Pages

    Define the pages in a document type in which you want to search.

    • FIRSTPAGE
    • LASTPAGE
    • ALLPAGES

    Max Search Results

    Define the maximum number of results that will be returned for the search.

  5. Click Apply to save any changes.
  6. Click Learn DB to generate the lucene indexes.

Test Extraction

You can test the extraction results from the Document Types folder. Supported image files are PDF or TIFF.

  1. Select the document type and click Test Extraction.
  2. Click Extract.

    The Extraction Type column shows how the data was extracted. If a FuzzyDB mapping matches the mapped fields, it will list FuzzyDB as the extraction type.

Perform Fuzzy Search and Extraction

Once you execute a batch, the document appears on the Validation screen.

  1. In the Fuzzy Search field, enter a mapped index field (see Document Fuzzy or Field Fuzzy).

    A pop-up window displays the fuzzy search results.

  2. From the table, select the record that you want to fetch.

    You can enter an asterisk ( * ) into the search bar to fetch all records from the table.

    You cannot select two rows from the same group.

Multiple Rows:

  • If multiple rows are selected, the option with the highest confidence score is populated in the field.

  • If multiple rows are selected that have the same confidence, the option with the higher weight is populated.

  • If multiple rows are selected that have the same confidence and weight, the first record on the extraction list is populated.

Multiple Fuzzy Extraction Methods:

  • If both Document Fuzzy and Field Fuzzy are configured and the same index field exists in both, the option with the higher weighted confidence (confidence x weight) is populated in the field.