Machine Learning Custom Dictionary support
Dictionaries are a part of the machine learning mechanism. They are created and used to extract those types of values, for which it is not possible to define any specific regex. Dictionaries contain sets of predefined values such as the US States, US Cities, Personal Names, etc. One of these values is selected at the time of data extraction according to the system settings.
Previously, the dictionaries were provided at the application level and were stored in the META-INF folder (EphesoftApplicationMETA-INF).
In Transact, the default dictionaries are on the Batch Class level. The current path for the machine-learning-dictionaries folder is EphesoftSharedFoldersBC{Id}machine-learning-dictionaries (the folder structure is explained in detail below). You can also add your own custom dictionaries:
- You can create the dictionaries at the time of DLF learning on the Validation screen. Whenever you create and click on the overlay on the screen, the Suggestion View window pop-up with all the predefined and custom regex types as well as dictionaries. Here, you can create new custom types of dictionaries. The custom dictionary can contain any number of values. Once the dictionary values are added and saved, they will be used during extraction.
- You can import the dictionaries from the Batch Class Management screen. The main menu of each Batch Class now includes a new machine-learning dictionaries tab. Here, you can use Import Machine Learning Dictionary(s) section to upload your dictionaries into the system. The Export button allows you to export selected folders or files.
Machine-learning-dictionaries folder
All the dictionaries are provided at the Batch Class level. The path for the machine-learning-dictionaries folder is:
EphesoftSharedFoldersBC{Id}machine-learning-dictionaries
This folder has the following subfolders:
-
language-packs: This folder contains language-specific text files with stop words (used in machine learning to filter out any words, which are not to be extracted like "and", "the"). The user can add, modify or delete any file in this folder. By default, language-pack dictionaries are provided for English, German, French, Turkish, Spanish, and Dutch:
en_stopWords.txt contains English stop words.
de_stopWords.txt contains German stop words, etc.
-
knowledge-base: This folder contains regex and dictionaries subfolders.
- The regex folder
-
The regex folder contains regex-specific text files.
-
The regex.txt file contains simple predefined regex, such as Number, Date, SSN, Amount, Email, etc. as well as custom regex created by a user.
-
The composite.txt file contains the information about the Composite types created by the user via Suggestion View window on the Validation screen. The Composite type name (or custom block name) is mapped against the Composite type values (either created or predefined). Data will be stored in the following format:
Custom_Block_Name=Custom_Regex_Name/Predefined_Regex_Name|Custom_Regex_Name/Pre-defined_Regex_Name
Where custom block name is followed by the equal sign, followed by a series of custom regex names or predefined regex/dictionary names, separated by pipe operator "|".
Example: CustomBlock1=CustomId|SSN
It creates a custom block with name "CustomBlock1" which contains regex of "CustomId" (where "CustomId" is custom regex type) followed by regex of "SSN" (where "SSN" is predefined regex type).
The composite block type cannot have composite types as part of its definition. -
The regex_mappings.properties file contains parent-child relation mappings for regex.
Child = Parent Number
=
ALL
Date
=
ALL
Amount
=
Number
USA_Amount
=
Amount
NON_USA_Amount
=
Amount
DD_MM_YYYY
=
DATE
MM_DD_YYYY
=
DATE, etc.
-
- The dictionaries folder
-
This folder contains dictionaries and dictionary_mappings.properties file.
-
The dictionaries folder includes both default dictionaries and custom dictionaries (created or imported by the user) in .txt format.
-
The dictionary_mappings.properties file contains dictionary types mapped against corresponding .txt files. Here, you can also specify whether the dictionary should be displayed in the list of Predefined Types in the Suggestion View window on the Validation screen: Dictionary Type=Dictionary File=Display: -1, 0, 1.
The following dictionaries are provided by default:
NAME=name.txt
PERSON_NAME_PREFIX=personNamePrefix.txt
PERSON_NAME_SUFFIX=personNameSuffix.txt
USA_CITY=usCity.txt
PARTIAL_CITY=partialUSCity.txt
USA_STATE=usState.txt
PARTIAL_STATE=partialUSState.txt
COMPANY_SUFFIX=companySuffix.txt
ORGANIZATION_NAME=organizationName.txt
Display options:
- 1 = hidden and not loaded into memory (if the dictionary is a part of the composite block type, neither the dictionary, not the composite type will be displayed in the Suggestion View window)
0 = hidden and loaded into memory (if the dictionary is a part of the composite block type, the dictionary will not be displayed; however, the composite type containing it will be shown in the Suggestion View window)
1 = displayed and loaded (both the dictionary as well as all composite types containing the dictionary will be displayed in the Suggestion View window)
By default, the English language dictionary is used if the required dictionary file is not present.
-
Create custom dictionary
There are two ways to add a custom dictionary. You can create it on the Validation screen during DLF training or you can import it from the Batch Class Management screen.
To create a custom dictionary on the Validation screen:
- Place your cursor in the text box of the index field to be learned in the middle pane of the Validation screen.
-
On the image view pane of the
Validation screen, click on the area of the image where the index field is located (right-click
to draw overlay on multiple values).
An overlay appears on the image and the text box is populated with the index field value.
- Click on the overlay to open the Suggestion View window.
- Select the Create Type option and from the Type drop-down list, select Dictionary.
-
Define the
Type Name and add as many values for the dictionary as required by using the plus button.
Also, use the corresponding button to delete any value.
- Click OK to save the custom dictionary.
Next time, the newly created dictionary will be included in the Predefined Type list on the Validation screen and will be used to extract a value on the basis of the predefined value set.
Modify custom dictionary
If required, you can modify the custom dictionary that you create. This can be done in several ways:
-
As an operator, you can add values to your dictionary on the Validation screen.
-
As an admin, you can modify your dictionary in the Folder Management section of Transact.
-
You can make changes directly in the dictionary .txt file on the Transact server.
Add values to your dictionary on the Validation screen
- Click on the overlay to open the Suggestion View window.
- Select the Create Type option and from the Type drop-down list, select Dictionary.
-
In the Type Name drop-down, find and select your dictionary name.
All values contained in the dictionary will be displayed in the Suggestion View window.
- Use the plus button to add values to the dictionary.
- Click OK to save the changes.
Modify your custom dictionary in Folder Management
- On the left menu panel, select Folder Management and double-click on the selected Batch Class.
- Navigate to the dictionaries folder (SharedFoldersBC{Id}machine-learning-dictionariesknowledge-basedictionaries) and find your dictionary.
- Select the dictionary and click Edit.
- Make the changes in your dictionary as required, the field is editable.
- Click Save to save the changes.
Make changes in the dictionary .txt file
- Navigate to the dictionaries folder (EphesoftSharedFoldersBC{Id}machine-learning-dictionariesknowledge-basedictionaries) and open the text file containing your dictionary.
- Add, remove, or change values as in an ordinary text editor.
- Save the changes.
Export a dictionary
Dictionaries can be exported so you can use the same dictionaries in other Batch Classes. An exported dictionary is downloaded as a .zip file containing the .txt file with associated dictionary values.
You can export the dictionaries from the Batch Class Management section as well as from the Folder Management section.
Export a dictionary from Batch Class Management
- Navigate to the Batch Class Management screen and select your Batch Class.
- Navigate to the folder.
- Select your dictionary and click Export.
-
Specify the destination folder and click
Save.
The.zip file saved on your local machine contains your dictionary in .txt format along with all associated values.
Export a dictionary from Folder Management
- Select your Batch Class.
- Navigate to , select your dictionary and right-click.
- Select the Download option.
- In the dialog window, specify the destination folder and click Save.
Import a dictionary
To import the dictionary in the Batch Class Management section:
- On the Batch Class Management screen, select your Batch Class.
- Navigate to .
-
In the Upload Machine Learning Dictionary(s) section, click
Select Files or drag and drop the file containing the dictionary into specified area.
The dictionary is imported successfully. Since you are importing the dictionary manually, the following message is displayed: "Please make corresponding changes in the mapping files manually".
-
To make changes in the mappings file:
- In the Folder Management section, select your Batch Class.
- Go to the dictionaries folder (machine-learning-dictionariesknowledge-basedictionaries) and select the dictionary_mappings_properties file.
- Click Edit.
-
Provide the following information to perform the dictionary mapping.
Field Description Key
Define the Dictionary name (such as Irvine_streets). This name will appear in the Predefined Types list in the Suggestion View window on the Validation screen.
Value
Define the dictionary text file (such as Irvine_streets.txt) and provide the Display value: 0 = do not to display Dictionary Type in the Suggestion View window on the Validation screen 1 = display Dictionary Type in the Suggestion View window on the Validation screen
- Click Save to save your changes.
Dictionary mapping can also be done directly in the dictionary_mappings properties file on the Transact server. For that, navigate to the dictionaries folder (EphesoftSharedFoldersBC{Id}machine-learning-dictionariesknowledge-basedictionaries), open dictionary_mappings properties file and perform the mapping as described above.
If you import a dictionary that already exists in the Batch Class, a pop-up window is displayed containing the list of dictionaries that are already present. You can select either to override or to merge the dictionary files.