Populate a document using the response from Google Cloud Document AI
Google Cloud offers a Document AI service as part of its cloud offerings.
Document AI offers several pre-built models and processors that are specifically designed to extract different types of data from various document types.
It also supports custom models and uptraining of existing models.
The OCR response contains a collection of entities with the fields that the processor was able to detect. Each entity represents a single field or table row.
The JSON structure is generic regardless of the processors used. Therefore, the sample process must hold a mapping between the entity name and the document field name. For example:
Import the sample package, Google Cloud Document AI Package.zip, available at \\TotalAgilityInstall\Sample Processes\Third Party Data Extraction into your TotalAgility Designer.
The sample package includes the following items.
Category |
Data Extraction > Google Document AI Everything in the package is added to this category. |
Processes |
|
Extraction group |
ThirdPartyExtraction: This extraction group contains several example document types. |
Classification group |
ThirdPartyExtractionCG |
Scan/VRS profile |
Third Party Data Extraction Profile |
Global data models |
GoogleDocumentAI_Extraction_Global: This model is generated using a sample response from the service |
Forms |
Scan create new job and Validation forms. |
Sample process: GoogleDocumentAI Receipt
This sample process shows how to populate fields that are returned using the expense (receipt) processor.
This sample process includes the following activities.
Activity | Description |
---|---|
Google AI Extraction | Sets a data object using JSON in the process. The JSON corresponds to the sample response from Google AI for the coffee-chain-receipt-sample.jpg image. (In a real work scenario this would make a call to Google AI) |
Process Receipt Extraction Results | Gets the receipt data from the JSON and populates the fields in the TotalAgility document. |
Clear down raw data | Clears down the raw JSON and data object, as they are no longer required. |
Validate | Validates the document. |
To try this at run time, do the following:
-
Browse to GoogleAI Receipt_Scan.form.
-
Scan the coffee-chain-receipt-sample.jpg image.
This sample process is configured to work with this image only.
-
Create a job for the process.
-
Select the work queue and click on the Validation activity.
The data for the document fields is displayed. Clicking on a field highlights the area on the document image.
-
You can validate each field of the document wherever required.
-
Click Yes to complete the validation.
Sample process: GoogleDocumentAI Payslip process
This process is similar to the Receipt process; however, the fields are for the payslip document type.
The Scan form is GoogleDocumentAI PaySlip_Scan.form and the image to scan is googlePaySlip.png.