Process documents

Document processing is performed in two steps. First, an attempt is made to classify the document. Second, the settings of that specific class are applied to the document to retrieve the document data.

Each time a document is processed, it is introduced by the Document_BeforeProcessXDoc event in the project script. This is because the incoming document is completely unknown. Use this event to control the recognition as shown in the example below.

You cannot call page recognition from script using the Tungsten Clarity recognition engine.

If project fields are defined their extraction events are executed first. Next, the classification events are executed and these classification events are also part of the project script. For example, use project field extraction to classify the document when a bar code result is used to classify documents to a specific class.

Extraction is performed for a classified document. This means that all locator methods are executed and by their assignment to the fields the document is getting its field results. The extraction events follow the defined class hierarchy and events take the field and locator inheritance into account. When the extracted document belongs to a child class, the extraction events for the inherited locators and for the inherited fields are also fired for all parent classes.

If foldering is enabled and you have folder fields defined, you cannot change any folder fields for any of the document processing events, such as DocumentValidated. This is because multiple documents are processed in parallel and any changes cannot be saved to the root Xfolder object.

Suppress OCR After Third Page

The following example shows how to skip recognition for all pages after the third page. All documents have three or less pages have full recognition results. Documents with more than three pages have recognition results for the first 3 pages only.

However, a side-effect of suppressing recognition for one or more pages is that bar code recognition does not work on any of the suppressed pages. This means that the Bar Code Locator cannot recognize any bar codes on suppressed pages.

' Project Script
Private Sub Document_BeforeProcessXDoc(pXDoc As CASCADELib.CscXDocument) 
    Dim i As Long  
    Dim Count As Long  
    Count = pXDoc.CDoc.Pages.Count

    ' suppress OCR for all pages after 3 
    For i = 3 To Count - 1 
       pXDoc.CDoc.Pages(i).SuppressOCR = True
    Next i

End Sub