Conflict Management
Document training inside the Transformation Designer usually includes additional testing and bench marks. Document training during validation cannot use these additional steps. To minimize the risk of extraction errors due to incorrectly trained documents, additional precautions are implemented. A document for a specific customer should be trained twice during online learning to allow a confident extraction. After the first trained document, the document fields for documents of the same layout are extracted with a lower confidence. These fields are invalid and an error description is displayed in the Current Error area. The best practice is that you confirm the field and mark the document for Extraction Online Learning a second time.
After a document is trained two or more times in the same way, other documents are extracted with a high confidence and therefore the field status in validation is no longer invalid.
Number of correctly trained documents | Result | Confidence |
---|---|---|
0 | Correct | 50 % (not confident) |
1 | Correct | 85 % (confident) |
2 | Correct | 90 % (confident) |
>=3 | Correct | 100 % (confident) |
A document with incorrect training data can cause problems in subsequent batches. As a result a document with the same layout is incorrectly extracted based on faulty training data. Due to the lower confidence after the first training, the Validation user can correct the error and train the correct values for this document by marking it for Extraction Online Learning. Now the specific training algorithm recognizes a so-called conflict.
The algorithm counts the number of documents trained for each version of the field position. For the next extraction the field position is chosen from the trained samples. The final field confidence depends on the number of sample documents for the correctly and the incorrectly trained version.
Number of correctly trained documents | Number of incorrectly trained documents | Result | Confidence |
---|---|---|---|
0 | 1 | Incorrect | 50 % (not confident) |
1 | 1 | Incorrect | 40 % (not confident) |
2 | 1 | Correct | 60 % (not confident) |
3 | 1 | Correct | 80 % (confident) |
4 | 1 | Correct | 85 % (confident) |
>4 | 1 | Correct | 90 % (confident) |
As long as a field is invalid, a modified icon and an error description is displayed for the document in the training subset.
You can resolve a conflict in the Resolve Conflicts window. You do this by deleting a incorrectly trained field or document, or by correcting the field position. To improve the extraction results for those documents, you also can confirm the field value by using the Edit Document window.
By eliminating a conflict between two documents, it is possible that other conflicts are resolved indirectly. The Resolve Conflicts window skips the documents that are no longer in conflict, and the current status is displayed in the status bar.
If a conflict is caused by contradicting table fields, you must delete the conflicting document, as only one document can be used for training the table layout. If however, you want to use this document for training other extraction fields, you must open the Edit Document window and skip training for the table fields by clearing the check box beside the Table Definition button.