Extraction New Samples and the Project Life Cycle

When a project is first put into production, its extraction results are not ideal. More training documents are accumulated during this time and decrease over time. As a result, certain tasks need to be performed more often for new projects.

As training documents are accumulated, extraction training documents sit in the Dynamic Specific Knowledge Base and are only accessed if other methods of extraction fail. This means two things.

  1. Documents in this knowledge base that contain extraction information are not used until other options fail. This means that extraction performance may suffer.

  2. As the Dynamic Specific Knowledge Base increases in size, the performance of the projects may suffer.

In order to ensure that your Dynamic Specific Knowledge Base does not get too large, and that you move your training documents into your Extraction Set on a regular basis, import your Extraction New Samples, resolve conflicts, and train your project after the following time intervals:

  • After one week

  • After two weeks

  • After three weeks

  • After four weeks

  • After two months

  • After three months

  • After six months

  • After one year

After one year, your project should be successfully processing documents without many problems. By now, the only time training documents are collected is when a new vendor or form is encountered. Continue to monitor your project and import the documents and re-train your project, every six months or so.