RecAPI
OmniPage Capture SDK 21

The OmniPage Capture SDK 21 lets developers access a broad range of algorithms and workflows for integrating them into their own applications. Version 21 comes with a series of new features. This entails the introduction of a small number of new RecAPI functions, but the underlying architecture is unchanged since version 15. Therefore, the term CSDK (without version number) used in this chapter refers to versions from 15 to 21, CSDK 21 refers to version 21 only, while CSDK 12 refers to version 12 including sub-releases up to 12.7.

Throughout the documentation we frequently refer to CSDK as Engine meaning all the modules of RecAPI.

At this time, the full functionality of the OmniPage Capture SDK is available only in Windows 32-bit operating systems. On other platforms (Windows 64-bit, Linux, Embedded Linux, Android, Mac OS X) there are different limitations. Documentation of some features list the supported platforms. If description of a feature does not give such a list, the feature is supported on all platforms. Non-specified Windows means both 32 and 64-bit. Linux means desktop Linux on x64 processors. For more information see the page Platforms.

Documentation

The OmniPage Capture SDK online documentation comprises four help systems:

All these helps are accessible in any CSDK package.

How to find it...

We provide two indexes. One is an integral part of the HTML system. Access it with the Index tab on the left of the Help window. It lists function names and other technical items, as in previous CSDK versions. There is now a second Index for making it easier to search this help system. Here the entries are descriptive and feature-oriented, rather than technical.

In addition, the links below point you to useful topics in the documentation.

Introducing RecAPI

RecAPI is a traditional, standard native "C" API separated into two layers: KernelAPI and RecAPIPlus. This simplifies the architecture and also makes the toolkit's offering more flexible: higher speed processing to simple text output or more thorough processing with formatted output. RecAPI also provides a base for RecAPI P/Invoke assembly for .NET managed applications.

A higher layer set of services is provided by IPRO and RecIPRO - with an updated object model. IPRO serves as the base for a range of visual components. To access IPRO and Visual Components documentation, use their own Help systems.

On Windows, RecAPI supports the macro _UNICODE. Thus depending on this macro User can use the RecAPI as an ANSI (current code page) or a Unicode API. On Linux and Macintosh, RecAPI has an UTF-8 API.

Here are the main RecAPI components:

The KernelAPI part of RecAPI is useful when processing speed is a critical factor and/or there is no need for layout retention. This is the basic API with page-level functions. Therefore, this is very similar to APIs of earlier CSDK versions. We recommend that you use it for applications that do not require total format retention and rich output formats. KernelAPI generates its simple outputs directly from HPAGE. Here you can choose TXT, CSV, two different XML types, or PDF (image on text). Since page formatting is not present here, page-oriented processing on the KernelAPI level will be faster.

The RecAPIPlus part of RecAPI contains a user interface based on a documentoriented approach. You can implement functionality similar to that of OmniPage 21 via just a handful of functions. When exporting multi-page documents you can use a series of output converters to provide layout retention quality equal to that in OmniPage 21. Use the RecProcessPagesEx function to process many multi-page image files in one step. In this case, CSDK runs processing on several image preprocessing and OCR threads (multi-threading is supported on: Windows), enabling you to make the most of HyperThread and MultiCore processors and multi-processor computers. Getting high quality page layout retention and document level formatting adds somewhat to total processing time.

In both cases you can choose between three page parsing algorithms: Fast, Legacy or Standard. The latter is default and requires more time but delivers the best possible result, improving both recognition accuracy and format retention. Since the Standard page parsing package occupies some 20 MB, the Distribution Wizard offers it as a separate component.

Detailed descriptions

History of Main Features
OCR Engines
Platforms
Tutorial