Document Transformation
The Document Transformation step helps you extract and use information from images and text documents. The Kofax RPA Document Transformation Service can process .png, .jpeg, .jpg, .tif, .tiff, .pdf, and .txt files. You can submit multiple documents either as a .zip archive or as a path to a folder with files. If you use document separation feature in Kofax Transformation, Kofax RPA receives several documents that you can navigate in DT browser. See Document Transformation workflow for details.
The Kofax RPA Document Transformation Service can also process Natural Language Processing (NLP) requests using the Sentiment project to help you detect the mood of the text, such as positive or negative, and to extract entities, such as company names, person names, and so on. You can use the Sentiment project to process customer reviews to understand whether customers are satisfied with the service or not. Moreover, you can use it to find all mentions of your company in an article. The Sentiment project can be used with KTT version 6.3.1 or later. See the Sentiment project in Predefined projects for details.
Properties
- Action
-
Select an action to perform using Kofax RPA Document Transformation Service.
- Service URL
-
Specify a URL and a port if necessary for the computer running Document Transformation Service. If the service is installed locally, enter localhost in this field. The URL must include the http:// or https:// prefix. If you use https, the web hosting service should have a certificate accepted by well-known certificate authorities.
- Project Type
-
-
Default Project: This option provides a set of predefined projects. See Predefined projects below.
-
Custom Project: When you select this option, specify the path to the project to process your documents in Custom Project Path.
-
- Document Source
-
Select how the robot locates a document to process.
-
Local file: Enter the path to one or more documents to process in File name. Use either a full path to an image file, .zip archive, folder with files, or another file of the supported format accessible form the computer running a robot.
-
Robot File System: Enter the path to the configured file system and the file name, such as myshare/doctotransform.pdf. The file system name must correspond to that specified on the Robot File System tab in the Management Console.
-
Binary Variable: Specify a binary variable that contains a document.
-
- Validation URL
- Select this option to specify a URL for the thin client service. This property is required to send processed
documents for validation. The URL is specified in the
ValidationService property of the
Document Transformation Service. The URL may look similar to the following:
http://localhost:8082
- Callback URL
- Select this option to specify a REST Robot URL for the thin client service to call after a document is validated.
When validation is complete, this URL is used to start a robot by the
Management Console. The URL must include the
Management Console
address along with a path to the robot to run. The URL may look similar to the following:
http://localhost:8080/ManagementConsole/rest/run/Default project/binaryInputAndWait.robot
You can find a valid callback URL in the Management Console by clicking the REST button for the given robot. Such a robot should have an input variable called document with an attribute doc of type binary. When the robot is called, the doc attribute of the document contains the transformed and validated document. If the Management Console needs credentials for login, they can be provided in the URL as follows:
http://user:password@localhost:8080/ManagementConsole/rest/run/Default project/binaryInputAndWait.robot
See REST under Execute Robots for more information.
Predefined projects
You can edit transformation projects supplied with Kofax RPA and your custom projects in the KTT Project Builder installed by Kofax RPA. Once you open the Project Builder you can access its documentation.
- Barcode project
- The purpose of this project is to extract all barcodes from the document.
To change the Barcode project settings, perform the following steps.
-
Locate the Kapow_Barcodes.fpr project file.
-
Open it in Project Builder.
-
Select the class Default in the Project tree on the left side.
-
Click the eye symbol to open the details.
-
Under Locators, double-click the BL barcode locator. By default, the locator is configured to auto-detect the barcode type.
-
Clear the Auto detect option under Type and select a specific type.
-
By default, the locator is configured to automatically detect the orientation. Clear the Auto detect option under Orientation and select a specific orientation.
-
By default, the locator is configured to look for barcodes on all pages of a document. To limit the barcode detection to a specific set of pages, select the Regions tab and modify the Enable locator for settings accordingly.
-
When you finish editing the project, close all dialog boxes and click Save Project on the Project tab.
-
- Invoice projects (Invoice Sales TAX and Invoice VAT)
-
These projects are designed to extract invoices from the USA and it also supports sales tax. The projects require setup of ERP master data to extract the vendor properly. The master data for vendors and internal companies needs to be provided as a csv file. The projects contain a vendors.csv file and an internal_venders.csv file that can be adapted to the company-specific vendors.
To provide and configure master data, perform the following steps.
- Prerequisites
-
-
The vendors file is a semicolon-separated document called Vendors.csv. The file must have the following columns:
-
VendorID (required)
-
CompanyCode (optional)
-
Name (required)
-
Street (required)
-
City (required)
-
ZIP (required)
-
PostBox (optional)
-
Country (required, 2-character country code)
-
FIDNumber (optional)
-
Phone (optional)
-
Fax (optional)
-
URL (optional)
-
Email (optional)
-
-
The internal vendors file must be called Vendors_Internal.csv. It is a semicolon-separated file with the same columns as Vendors.csv. Internal vendors are those that are internal within the customer's enterprise. This file is used to exclude those from the vendor results, as they are easy to confuse with the bill-to address on any normal external invoice.
-
-
Locate the Kapow_Invoices_SalesTax.fpr project file or the Kapow_Invoices_VAT.fpr project file and open it in the Project Builder.
-
Open Project Settings.
-
Select the Databases tab.
Note that the 2 Fuzzy Database items are flagged as red, because the path is incorrect.
-
Double-click Vendors.
-
Select the path to the Vendors.csv on the network share.
-
Click OK to close the dialog box. The file is imported.
-
Double-Click Vendors_Internal.
-
Select the path to the Vendors_Internal.csv on the network share.
-
Click OK to close the dialog box. The file is imported.
-
When you finish editing the files, close all dialog boxes and click Save Project on the Project tab.
- Online Learning
- By default, the Online Learning feature, which helps increase the rate of field recognition on similar documents, can only be enabled for the Invoices projects. When specifying the path to the folder where your training documents are stored, ensure that the folder already exists. If it does not exist, you receive a notification prompting you to create it. To proceed, click Yes.
- Language project
- The purpose of this project is to identify the language a document is written in.
This project is not configurable.
- OCR project
-
The purpose of this project is to return full text OCR results for the document. Note that this project does not include the validation process by default.
To change the OCR recognition language from the default (English), perform the following steps.
- Locate the Kapow_OCR.fpr project file.
- Open it in Project Builder.
- Click Project Settings.
- In the Project Settings dialog box, select the Recognition tab.
- Select the FineReader page profile.
- Check the desired language.
- Close all dialog boxes and click Save Project on the Project tab.
- US Address Extraction Project
- The purpose of this project is to extract all US addresses from a document.
This project is not configurable.
- Sentiment project
-
The purpose of this project is to deduce the mood of the text, such as positive or negative, and to identify entities, such as company names, person names, and so forth. In the transformed document, the mood is displayed in the Sentiment field in a number from -1 to 1 where -1 is completely negative and 1 is completely positive. For example, 0.257545 represents a slightly positive text.
By default, the project processes English language texts. Language bundles for the Sentiment project are distributed separately in three .msi installers:
-
Kofax NLP Western Default Language Bundle: English, French, German, Portuguese, Spanish
-
Kofax NLP Western Extended Language Bundle: Dutch, Italian, Romanian
-
Kofax NLP Additional Language Bundle: Japanese, Korean, Mandarin
Language bundles are not installed by default. To use any available language, install the appropriate language bundle. For example, to use the English language, install the Kofax NLP Western Default Language Bundle.
The bundles are installed as Windows programs and have no options. To remove a language bundle, open the Programs and Features or Apps and features from the Control Panel, select the bundle and click Uninstall.
To change the recognition language from default (English), perform the following steps.
-
Open the Sentiment project in the Project Builder.
-
Click Project Settings.
-
In the Project Settings dialog box, click the Properties button.
-
Clear the default English language option.
-
Select the language.
-
Close all dialog boxes.
-
In the Project Tree, select the Default Project Class definition.
-
Scroll down and select the desired language from the language list.
-
Click Save Project on the Project tab.
-
- Customized project
- When you select this option, specify the path to the project to process your documents in the Project Name property, such as c:\rpa\ocr. The project link must be a locally accessible folder on the Document Transformation host and not on a computer running Design Studio.
DT browser
The DT (Document Transformation) browser shows transformation results and helps you work with extracted data in the document. The following table explains the DT browser toolbar elements.

Button |
Description |
---|---|
The Page section helps you navigate in a multi-page document. |
|
|
Navigates one page back in a multi-page document. |
|
Navigates one page forward in a multi-page document. |
|
Navigates to the specified page in a multi-page document. |
The Document section helps you navigate in multiple and separated documents. |
|
|
Navigates to the previous document. |
|
Navigates to the next document. |
|
Navigates to the document by number. |
|
Navigates to the document by name.
|
The Validation section helps you validate a document. |
|
|
Sends a document for manual validation to the specified Document Transformation Thin Client server. |
|
Status of the transformed document with error description if any. |
Document Transformation workflow
The Document Transformation action processes your graphical or PDF documents using a selected project. A project is a module that processes and transforms your documents using OCR and other specified operations.
The processing result is returned to the Desktop Automation robot and opened in the Document Transformation Browser in the Recorder view. The service forms an element tree with all extracted information. Note that in a multi-page document, you can browse through the pages using the Previous and Next buttons on the DT browser toolbar. See DT browser for details.
Elements in the tree contain confidence levels for the OCR results and other extraction results defined by the project. The confidence attribute can contain values from zero to one where the most confident is one.
Derived attributes such as der_x help you find the element and can be used in finders.
Once the transformed document is in the editor, you can determine whether you want to perform the validation of the transformation results. If you are satisfied with the transformation results without any validation, you can extract and use the data in the document.
Validation is performed by the
Document Transformation
Thin Client. Click
in the
Document Transformation Browser
to send the document to the specified Thin Client. A unique URL is generated and returned to the robot. The robot extracts
the URL and uses it to send the document to a validation user, such as via email. The validation user clicks the URL, enters
credentials, and after that the document with the extracted data opens. The validation user inspects the transformed document and,
if needed, modifies extracted information in the document.
When validating documents, the user can enable the Online Learning feature to increase the rate of field recognition on similar documents. This feature is based on remembering the layout of a sample document, such as an invoice. By using automatic field completion, manually typing or selecting the correct value in the document, the user contributes to the knowledge base, which improves extraction results when the user works on a similar document next time.
When validation is finished, the validation user marks the document as valid. When the document is marked valid, it is used as an argument for a robot specified in the Callback URL in the Document Transformation action.