Auto KV plugin (formerly Key Value Learning plugin)
The Auto KV plugin, which replaces the Key Value Learning plugin, is available for use in an on-premises or Cloud environment. The Auto KV plugin is available in Ephesoft Transact Labs.
This plugin is used to generate Advanced KV pairs to make the data extraction more appropriate based on past data extracted by the user manually. It keeps track of the data which is extracted manually by the user by populating DLFs directly from the third panel image. Based on this, it generates advanced KV pairs using regular expressions defined in property files and saves them for corresponding document types. To configure the plugin properties, use an ON/OFF switch from admin UI and property files: dcma-key-regex.properties, dcma-key-value-location.properties, and dcma-value-regex.properties defined in META-INF.
This plugin iterates over each document level field of each document. First, it matches the value of the document level field with the regex patterns defined in the properties file. Most matched regular expressions become the value pattern for that field, which is picked from the properties file. This document level field value is then searched in the OCR data {HOCR file} for that page of the document.
If a value is found successfully, it searches the key value in all the eight directions as a location and tries to match it with the regex patterns defined in the properties file. Most matched regular expressions become the key pattern and as it is found in the left of value (meaning that the value exists in right of the key), the location is set as RIGHT. If no value is present in left, the plugin consequently searches its top, right, bottom and other locations and matches it to the regex patterns in the properties to get the key pattern and accordingly set the location.
Location is set here for processing purposes only. This location has no link with the Location field displayed in Advanced KV pairs. The Location field value is always empty for generated advanced KV pairs.
- If any value is not matched to any of the regex patterns, the value itself is set as the key pattern of this field.
- Application searches the key locations in the following order, which can be configured through semicolon separated in
the property files. As soon as it is able to find the first value, it takes that location:
- LEFT
- RIGHT
- TOP
- BOTTOM
- TOP_RIGHT
- TOP_LEFT
- BOTTOM_RIGHT
- BOTTOM_LEFT
Multi-word support for KV Learning
Key Value Learning plugin in Export module automatically creates a Key Value field corresponding to a document level field.
This enhancement allows multi words to be used for generation for key pattern in Key Value Learning plugin in Export module. If any word is found close to the key, it is appended to the key and will be used for the key pattern generation.
Keys are appended left for location LEFT, BOTTOM, TOP, BOTTOM_LEFT, TOP_LEFT and appended right for location BOTTOM_RIGHT, TOP_RIGHT.
Property file configuration
Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-key-value-learning/dcma-key-value-location.properties
Configurable property |
Type of value |
Value options |
Description |
---|---|---|---|
key_value.location_order |
String |
LEFT;RIGHT;TOP;TOP_LEFT;TOP_RIGHT; |
It is a semicolon-separated list of location. It represents the order of location in which key will searched in the image. Locations specified are of key with respect to value. |
key_value.max_number_record |
Integer |
NA |
It represents the maximum number of key value pairs that can be present for any DLF. If any DLF has already this maximum number of key value fields defined, this plugin does not add any more key value pair to this DLF. Default value is 50. |
key_value.tolerance_threshold |
Integer |
|
Length and width of the value rectangle created by the plugin are increased by this tolerance value (width + (width*tolerance)/100). For example, if calculated width of plugin is 100 pixels and tolerance specified is 10, resultant width will be 110 pixels. |
key_value.multiplier |
Integer |
Integer value |
This property holds an integer value which decides on <some logic>. (Also mention range if applicable.) |
key_value.fetch_value |
String |
|
Fetch value for key value field that is being created by the plugin. Default value is FIRST. |
key_value.min_key_char_count |
Integer |
NA |
Minimum number of characters that must be present in the extracted key. Default value is 4. |
key_value.gap_between_keys |
Integer |
NA |
Any word found left or right (depending on the location of Key found with respect to Value) will be considered for key depending on its distance with respect to the key. Default value is 50. See below example. |
UI configuration
Key Value Learning can be turned ON/OFF.
Configurable property |
Type of value |
Value options |
Description |
---|---|---|---|
Key Value Learning switch |
List of values |
|
Set it to ON/OFF depending on whether plugin needs to be executed or not. |
Numeric Key Learning switch |
List of values |
|
Set it to ON/OFF depending on whether plugin needs to be executed or not. |
Dependencies
The Key Value Learning plugin depends on following two plugins:
- RECOSTAR_HOCR
- TESSERACT_HOCR
One of the above plugins must be ON for Key Value Learning as these plugins extract data from the image and create HOCR file, which is required for the Key Value Learning.
Frequently asked questions
Question | Solution |
---|---|
Key value field not added to the document level field after plugin execution. |
There could be multiple reasons for key value field not created after plugin execution: Reason 1: Maximum allowed number of key value fields have been already added to the document level field. Solution: Check the value for key_value.max_number_record. Default value is 50. |
Reason 2: Key found during extraction has less number of characters than minimum number of characters required for key. Solution: Check for the key_value.min_key_char_count property. Default value is 4. |
|
Reason 3: Required location is not defined in key_value.location_order property. Solution: Check for the value of property key_value.location_order. It should have the required location specified. |
|
Key value field added but is not accurate. |
Reason: One possible reason for such an issue is location order specified is not as per the requirement. Solution: Check for the key_value.location_order property. Most possible for key with respect to value should be specified first in the list. |