Key Value Page process

The Key Value (KV) Page Process plugin enables you to classify documents based on keywords, also known as keyword-based classification. This feature improves document classification based on user-defined static and dynamic keywords in the document. The KV Page Process plugin can be tailored to the specific needs of your organization.

Prerequisites

To configure and use the KV_PAGE_PROCESS plugin, the following conditions must be met:

  • You need a batch class with a document type configured. For detailed steps, see Add new document type.

  • You need to add the KV_PAGE_PROCESS plugin to the Extraction module for the batch class. For more detailed steps, see Configure plugins.

Configure the plugin

To implement keyword-based classification within Transact, you must first configure the plugin. To do so, perform the following steps:

  1. From the Batch Class Management screen, open your batch class.

    The batch class opens with a list of document types.

  2. From the left navigation pane, select Modules > Page Process.

    The Plugin Configuration screen appears.

  3. Select the KV_PAGE_PROCESS plugin from the Associated Plugins column.
  4. Move the KV_PAGE_PROCESS plugin to the Selected Plugins column. Make sure the plugin is towards the bottom of the list, after the CREATE_OCR_INPUT and RECOSTAR_OCR plugins.

    Some plugins have dependencies on other plugins. In this case, you may get the following message:

    Selected Plug-in has dependencies on the highlighted Plugin(s). Would you like to add them as well?

    Click YES to add the selected plugin along with the dependencies. Click NO to add the selected plugin without the dependencies. Click CANCEL to cancel the operation.

    There is currently a known issue with CHE being incorrectly marked as dependent. See CHE Plugin Incorrectly Marked as Dependent for more information.

  5. Determine what method you want Transact to use for classification.
    Option Description

    Only use the KV_PAGE_PROCESS plugin

    1. Remove the SEARCH_CLASSIFICATION plugin from the Page Process module.

    2. Configure the DOCUMENT_ASSEMBLER plugin to use KeywordClassification as the DA Classification Type. See Document Assembler plugin for help.

    3. With this configuration, Transact will only use keyword classification for classification.

    Automatic classification based on all configured classification methods

    1. Keep the SEARCH_CLASSIFICATION plugin in the Page Process module.

    2. Configure the DOCUMENT_ASSEMBLER plugin to use AutomaticClassification as the DA Classification Type. See Document Assembler plugin for help.

    3. With this configuration, Transact will use all configured classification methods and then choose the result with the highest confidence score.

  6. Click Deploy.
  7. Go to Modules > Page Process > KV_PAGE_PROCESS.
  8. Set the KV Page Process Switch to ON.
  9. If required, configure the KV Page Process Max Results.
  10. Click Deploy.

Create page-level fields

The Key-Value Page Process plugin can be used for document classification and separation by executing named key-value extraction rules during the Page Process step of the workflow. Transact extracts these values and stores them as page-level fields. Page-level fields help you build classification rules. See Create classification rules.

The next stage of the process involves defining which values to pull from the document. To do so, specify values in the page-level fields.

  1. Go to Modules > Page Process > KV_PAGE_PROCESS > Page Level Fields.
  2. Click Add.
  3. Click Select Files or drag and drop files into the KV Page Process panel.

    Supported file types are PDF and TIF.

    The KV Page Process Rule Builder appears in the Image View pane.

    Image with key and value field
  4. Enter a unique value in the Field Name box.

    This name will be used to identify this field in the subsequent Create classification rules section, so use something descriptive. If the KV field will only be used to identify a single form, such as DA 71, you can name this field something like "DA Form 71." If the KV field will be used to identify a variety of different forms, you can name this field something like "DA Form Number."

  5. Drag and resize the key overlay (green) to the text or label that will serve as the key.
    Resizing the key and value fields
  6. Click the key overlay to open the Suggest Regex window.

    This will display the text captured inside the overlay.

  7. Select your selected regex and click OK.

    This regex is added to the Key field in the left column.

  8. Repeat steps 57 for the value overlay (red).
  9. Enter the relevant configuration details as described in the following table.

    Configurable property

    Options

    Description

    Fuzzy %

    • None
    • 10%
    • 20%
    • 30%

    This parameter specifies if the key is identified even if there is not an exact match to the key regex pattern.

    • None: The key must be an exact match to the key regex pattern.

    • 10%: Allows for 10% of the characters to be inaccurate.

    • 20%: Allows for 20% of the characters to be inaccurate.

    • 30%: Allows for 30% of the characters to be inaccurate.

    If you select a fuzzy percentage, you cannot use a regex for the Key pattern. If you select both a fuzzy percentage and key regex, you will not receive any results during extraction.

    Fetch

    • ALL
    • FIRST
    • LAST

    This parameter specifies how many instances of the value should be extracted from the value zone.

    For example, if ALL is selected, all characters from the value zone will be extracted. If FIRST is selected, only the first matching pattern from the value zone will be extracted.

    Zone*

    • ALL
    • TOP
    • RIGHT
    • LEFT
    • MIDDLE
    • BOTTOM

    This parameter specifies on which section of the document to perform key-value extraction.

    For example, if ALL is selected, key-value extraction will be performed on the entire page. If TOP is selected, it will only be performed on the top section of the page.

    Weight

    0-1

    This parameter is a configurable value (0-1) that is multiplied by the confidence score to calculate a new confidence score. This is to allow an extraction rule to be prioritized over another within the same index field.

    X Offset

    N/A

    The application will set this value based on the placement and size of the key and value overlays.

    Y Offset

    N/A

    The application will set this value based on the placement and size of the key and value overlays.

    *Extraction is not limited to the exact positions of the key and value defined in the extraction rule. Transact will search the specified fetch, page, and zone for the key pattern and return results if it finds the matching value pattern. This allows Transact to extract data from structured, semi-structured, and unstructured documents.

  10. Click Test KV.

    The extraction results are displayed in the KV Page Process grid.

  11. Click Apply KV to save the rule.

    The updated Page Level Fields screen appears.

  12. Click Apply.

    Follow the process described above to add multiple page-level fields.

Create classification rules

This stage of the process involves creating classification rules. Classification rules use page-level fields to determine classification and separation. It essentially gives if/then instructions to Transact, clarifying how it should treat certain documents based on the associated page-level fields. Now that the page-level fields are established, classification rules can be added.

  1. Go to Modules > Page Process > KV_PAGE_PROCESS > Classification Rules.
  2. Click Add.
  3. Select an existing document type from the Document Type Name field.

    When the rules are executed, if this rule is successful, the document is set to this document type.

    The Multipage Document option is selected by default. Clear it for single-page document types.

  4. Configure the remaining fields according to your workflow needs.

    The following fields are available in the grid on the Classification Rules page of the KV_PAGE_PROCESS plugin.

    Column name

    Description

    Document Name

    Document type for which the rule expression is configured.

    First Page Rules

    Rules set to identify the first page of the document. Multiple rules for the first page are combined using an OR operation.

    Last Page Rules

    Rules set to identify the last page of the document. Multiple rules for the last page will be combined using an OR operation.

    MultiPage Document

    Select this option if documents of this document type have multiple pages. When working with single-page documents, Transact only needs to identify the document type. When document types can have multiple pages, Transact needs to understand how to determine where a document ends.

  5. Click Edit Rules.
  6. Click Add.

    The Add Rule Expression dialog box appears.

  7. Select page type from the Page Type field. The available options are First Page and Last Page.
  8. Enter the desired rule expression using the Auto-Suggestions menu.

Rule expressions can be any valid logical expression that can be resolved to a true or false value. This helps to determine the type of document being processed.

See Operations for rule expressions for a list of rule expression options.

Rule expressions

Rule expressions can be any valid logical expression that can be resolved to a true or false value. This helps to determine the type of document being processed.

A rule expression can be in the form of <Page Level Field> <Operator> <Value>

  • Page Level Field is the Field Name, defined in Page Level Fields. It is auto-suggested to users.

  • Operator can be any valid operation from the suggestions field.

  • Value can be any combination of characters enclosed within single quotes.

  • When multiple rule expressions are combined with AND or OR, individual expressions must be enclosed in parentheses.

Follow these steps:

  1. Click inside the Rule Expression field and select your desired page-level field from the list. Page-level fields appear at the top of the list.
  2. Add a space after the page-level field. The drop-down menu reappears.
  3. Select any valid operation from the Rule Expression field.
  4. Add more operations as needed to complete your expression.

    Many list items change after you select them. For example, page-level fields are automatically enclosed in single quotes, and the operators "and" and "or" are replaced with "&&" and "||" respectively.

    See the following examples of valid, complete rule expressions and their usage.

    Rule expression

    Usage

    'DA Form 71' is exists

    When used as a first page rule for classification, if the DA Form 71 field is found, then the document type is set to DA Form 71.

    ('Form1234' is exists) && ('Form1234Page1' is exists)

    When used as a first page rule for classification of multipage documents, if the "Form1234" page-level field is found, and the "Form1234Page1" page-level field is found, this page is determined to be the first page of the document.

    ('Form1234' is exists) && ('Form1234Page2' is exists)

    When used as a last page rule for classification of multipage documents, if the "Form1234" page-level field is found, and the "Form1234Page2" page-level field is found, this page is determined to be the last page of the document.

    ('Invoice No' is unique) && ('Company' is exists)

    When used as a first page rule for classification, if the invoice number is unique and the company name is on the document, then this page is determined to be the first page of a new invoice.

  5. After you complete your rule expression, click OK.
  6. Click Apply.
  7. Click Back.
  8. Click Apply.

    Page-level fields and classification rules defined for keyword classification are associated with individual document types. Therefore, if the user deletes a document type, all page-level fields and classification rules associated with that document type are automatically deleted as well.

Document assembly using keyword classification

For information about the keyword classification workflow, see Document Assembler plugin.

Operations for rule expressions

The following is a list of operations that can be combined to create rule expressions.

The Operation Name column is the user-friendly version of the Operator column. When building the rule expression using the Rule Expression field, the user picks from a list of operation names to build the rule. Once the rule applies, it changes to match the format in the Operator column.

Operation name

Operator

Example

Value type

Description

equals

==

'Invoice No' == 1234

  • Integer
  • String

Returns true if the page-level field exactly matches the value.

not equals

!=

'Invoice No' != 'abc'

  • Integer
  • String

Returns true if the page-level field doesn't match the value.

greater than

>

'Invoice No' > '2024-12-31'

  • Integer
  • Date (yyyy-MM-dd)

Returns true if the page-level field is greater than the value.

greater than or equals

>=

'Invoice Date' >= '2024-01-01'

  • Integer
  • Date (yyyy-MM-dd)

Returns true if the page-level field is greater than or equal to the value.

lesser than

<

'Invoice No' < 1234

  • Integer
  • Date (yyyy-MM-dd)

Returns true if the page-level field is less than the value.

lesser than or equals

<=

'Invoice No' <= 1234

  • Integer
  • Date (yyyy-MM-dd)

Returns true if the page-level field is less than or equal to the value.

and

&&

('Form1234' is exists) && ('Form1234Page1' is exists)

  • Any

Used to combine expressions. Any time 'and' is used, each expression must be in parentheses. Both operations must return true values for the overall operation to be considered true.

or

||

('Invoice Num' is unique) || ('Page1' is exists')

  • Any

Used for logical combinations of expressions. Any time 'or' is used, each expression must be in parentheses. If either operation results in a true value, the overall operation will be considered true.

starts with

=^

'Invoice No' =^ 'INV'

  • Strings

Returns true if the page-level field is found to begin with this string.

ends with

=$

'Company' =$ 'Ltd.'

  • String

Returns true if the page-level field is found to end with this string.

is missing

is missing

'Invoice No' is missing

  • Integer
  • Date (yyyy-MM-dd)
  • String

True if the page-level field is not found.

is exists

is exists

'Invoice No' is exists

  • Integer
  • Date (yyyy-MM-dd)
  • String

True if the page-level field is found.

is unique

is unique

'Invoice No' is unique

  • Integer
  • Date (yyyy-MM-dd)
  • String

True for the first occurrence of the page-level field in the uploaded batch, then false for all occurrences.

For example, if multiple bank statements are submitted to Transact at the same time, but each monthly statement has a different Statement Date field, the 'is unique' operator returns true for each page where a new statement date is found.