Workflow job items and related objects

Workflows (jobs) are manifested in XML files. These XML files can contain one or more job items.

IWR Workflows (jobs) are not identical to workflows familiar from the OmniPage OCR application. IWR Workflows are far more complex structures with a more sophisticated set of job items. An OmniPage workflow is just one of these job item types.

Multiple job items can be sequenced (dependent) or parallel. There are predefined job items implemented by the IWR API and you can create your custom job item using C++ or .NET. Workflow XMLs can be created by the Workflow XML Designer as predefined workflows, changing by your app only the input and output file names, or your application can construct them programmatically. The Workflow XML Designer is a good starting point. Workflow XMLs are executed by Intelligent Workflow Runner (OCRService).

Workflows are structured group of job items. Job items start to run when the parent job item finished. Job items without parent start to run when the workflow is started.

Job items use the native IproPlus interface. When starting, each job item receives the following:

  • The IproPlus Engine object, which is a singleton object of IproPlus.

  • The Document object passed by the parent job. Usually, the root job item creates the document, however, each job item can create a new document, or modify the document received from its parent. When the job item is completed, IWR passes the current Document object to the child job items.

The API .NET interface is available for developers through the OmniPageCSDK.IproPlus.JobService.dll assembly. The API native C++ interface is available for developers through a type library embedded in the OCRService.exe file.

OCRService object

The OCRService object handles the WorkflowXML execution and management. It has a very simple programming interface. There is no dependency between CSDK and OCRService. The Run() method of the interface works asynchronously. IWR applications receive the execution results with the Done event. The job ID of the Done event matches with the job ID of the Run() method. OCRService terminates the OCRServer.exe instances related to a Done event with a logged start or runtime error. For parameters providing information on the process, see Parameters of the Done event.

If the logging is enabled for OCRService, start errors may appear in the log. The log entries include each line of the OCRService and OCRServer source code related to an issue.

Always attach the related log files when you contact the support with an API-related issue. This can greatly ease the localization of possible errors.

The OCRService object is also responsible for the OCRServer process life cycle management to achieve 24/7 operation. OCRService can be hosted as a Windows Service. You can operate multiple nodes with OCRService to implement an application with scale-out capabilities. In this model, you need to distribute WorkflowXMLs to the OCRService nodes using queues, for instance, ServiceBus.

OCRService.exe is a compact but robust component, free of memory leaks, so an application that uses the IWR API will be independent of any breakdowns raised in CSDK. OCRService detects and eliminates crashes, dead-locks and infinite loops, so the IWR application receives an error report and can run without interruption.

OCRServer objects and scaling

OCRServer is the host of CSDK binaries, manifesting CSDK processes. OCRService is responsible for scaling-up, managing multiple OCRServers. By default, it maximizes the number of running instances to the number of CPU cores on the computer. The developer can set a limit lower than the number of CPU cores. OCRService.exe prioritizes the requests in the workflow and serves the next in order if there is an idle OCRServer instance. If there is no idle instance then it starts a new one if the limit is not reached yet. As OCRServer.exe completes a workflow, OCRService notifies the IWR application.

OCRService does not know which IWR application requested the workflow, so it notifies all clients. The application should identify the related workflow notifications by the unique identifier.

OCRService.ini

The OCRService object exports the Run() method, and the properties used for organizing the processing of a workflow. These properties can be set through the OCRService.ini file too. The OCRService object creates the OCRService.ini file at the end of the first run, and saves it in the %programdata%\OmniPage\OCR Service\ folder. If the OCRService.ini file exists, the OCRService object reads its content at the beginning of the process, and overwrites it at the end. Thus, you should not modify the OCRService.ini file unless there is no OCRService process running. For the parameters of OCRService.ini, see Parameters of OCRService.ini.