Semi-automated table extraction
You can use the Table Extraction plugin to perform semi-automated table extraction with little upfront configuration. This is a one-time extraction, which is quicker to configure than automated table extraction, and easier than hand-keying data into the system.
This approach works best for tables where all the data is contained in single columns. Data that spans columns may not be captured accurately.
- You need a batch class with a document type configured. For detailed steps, see Add new document type.
- Your batch class should be configured to stop batches in the Validation interface. To do so, you can set the Force Validation flag on an index field. For help, see additional configurations for Index fields.
Enable table extraction
- From the Batch Class Management screen, select your batch class and click Open.
- Go to .
- Move the TABLE_EXTRACTION plugin to your Selected Plugins list.
- Go to .
- Set the Table Extraction switch to ON.
- Click Apply.
- To add table columns, go to .
- Click Add to create a new table. Give the table an intuitive name.
- Select your table and click Open. This will open the Table Columns page.
-
Create entries for each column in your table:
- Click Add.
- Enter the Column Name and a Description.
- Enter the Column Number. This is the order of the column within the table, from left to right.
If your batch class does not contain an index field with the Force Validation flag enabled, you can also create a rule in the Validation Pattern to ensure the batch stops in Validation.
- Click Apply.
- Create a batch instance. For help, see Upload batch.
- To edit and validate results, from the Batch Instance Management screen, select your batch instance and click Open.
-
In the top menu bar, click
Table.
This will open the table view in place of the field view.
-
Click
Row Extraction.
You must click Row Extraction before filling in the first row, or you will receive an error message when you try to start extraction.
- In the left panel, select the leftmost cell of the first row.
-
In the right panel, click to select the field that corresponds to the first row of the first column. This
automatically highlight and extract single-word text.
To extract a multi-word or multi-line value, you can customize the size of the extracted field:
- Right-click the corner of the cell and release.
- Drag your mouse to change the overlay to the desired size.
- Right-click again to release the overlay.
- Repeat steps 16-17 for the remaining columns in the row.
- Click Start Extraction. The extracted data will be added to the table in the left panel.
- Delete any extra rows, and review the extracted content.
-
If required, you can use
Column Extraction to change the content of a column.
- Click Column Extraction.
- In the left panel, select the first field of the appropriate column.
- In the right panel, draw the overlay on the corresponding group of fields in the column.
- Click Start Extraction.
To review any other extracted fields, you can click Field View in the top menu to return to the standard field validation screen.
- Once you are satisfied with the results, click Validate.