Tungsten Automation

Kofax OmniPage Capture SDK 22.0.0.8

Build Date: August 5, 2024

© 2024 Tungsten Automation. All rights reserved.
Use is subject to license terms.

Introduction

Fix packs issued for OmniPage Capture SDK (CSDK) releases are cumulative updates.

Improvements

PDF/A-4 format support

Starting with FIXPACK1, Capture SDK now supports the following PDF/A-4 formats:

For PDF/A4 compatibility output, use the following IMF_PDF_COMPATIBILITY values:

Improved Arabic OCR accuracy

Starting with FIXPACK1, Capture SDK recognizes Arabic text 60% more accurately than the previous version. Recognition now is better prepared for mixed-language text, such as English pages with frequent Arabic notes and quotations, or the opposite.
Note: FTE and DCA support western languages only.

Issues resolved in this fix pack

22.0.0.8 - FIXPACK8

IDSupport Case IDIssue
2126240(26890479)Accuracy: Miscellaneous misrecognitions (I-1 and Q-O) occurred.
2126074(26966206)Intelligent Mail (USPS) and Postnet barcodes were misrecognized.
2107837N/AThe CSDK setting Kernel.Img.Binarization.UseGBRbin added noise pixels to the JPEG images.
2107104N/ATimeout was occurring in page parse when applying the setting Kernel.Img.Binarization.UseGBRbin = 1 (file-specific).
210592226869135Accuracy: a character was missing from the recognized FTE field.
2103566N/AMEM_NOTENOUGH_ERR was occurring during recognition in the MOR engine.
210153826861980The page range selection option for the input document was disabled and the range was not selectable in the DCA dialog.
2101024N/AAccuracy: OmniPage returned the invoice number twice.
2094482N/ALeading uppercase I ('I') character in invoice number was missing from the OCR results (file-specific).
2094429N/AAPI_HARDTIMEOUT_ERR occurred during recognition (file-specific).
2092851(26867720)Accuracy: 0-O misrecognitions occurred.
2090299N/AMOR_GPFAULT_ERR occurred with specific settings (file-specific).
2090112N/ASome text in a table header with grey background was not recognized.
208641526853623Loading and rendering a specific PDF page required 3.2 GB memory and took 100 times longer than for other pages in PDF (file-specific).
2086411(26849877)Accuracy: Omnipage recognized '0' as letter 'o' with French OCR-language.
2085819(26830433)Accuracy: OmniPage misrecognition problems.
2072242N/AAccuracy: Slash '/' characters were misrecognized as character 'I' or '1'.
2068706N/AThe letter 'O' and zero '0' in strings containing 'PO' could not be discriminated.
195987926700082General Protection Fault occurred while processing large PDFs with Arabic text.
195916926691269Crash occurred on Arabic PDF with non-gridded table.
186527726608344Certain PDF files processing with the setting "DataStream.TryToLoad=true" resulted in unreadable image.

NOTES for fix pack 22.0.0.8 - FIXPACK8

1.) For 2085819: Processing the PDFs attached to this bug item must be done using the following CSDK setting:
"Kernel.Img.Binarization.UseGBRbin"=1
2.) For 2126074: If the recognition result should also contain the check-digit of the Postnet barcodes, the application must set the following CSDK setting to true:
"Kernel.Ocr.Bar.bar1D.Postnet.cdt"=1

Issues resolved in previous fix packs

22.0.0.7 - FIXPACK7

IDSupport Case IDIssue
2081518N/ANumbers in a table with two lines were misrecognized in Japanese (file-specific).
208137626801651Pages of a PDF were loaded into CSDK improperly (file-specific).
206675226788711Recognition confidence of characters on repetition was not deterministic.
2058449N/AUppercase O ('O') instead of zero ('0') misrecognition occurred when Greek is selected as the OCR language.
1751003N/AWhen converting from PDF to Microsoft Word or Excel, the logo was missing in the output.

NOTE for fix pack 22.0.0.7 - FIXPACK7

This fix pack also includes improved line removal for small comb dividers.

22.0.0.6 - FIXPACK6

2055871 - 26778035 - Security vulnerability fixes in OPLA (OmniPage Licensing Agent).

2038358 - N/A - ENH.REQ.: AtalaSoft speed: new CSDK settings were introduced to tradeoff the 2D barcode recognition.

2038037 - N/A - RER_INTERNAL_ERR occurred during recognizing Thai document (file-specific).

2030520 - N/A - ENH.REQ.: The timeout value for the AtalaSoft engine is now configurable ("Kernel.Ocr.Bar_Atl.Timeout").

2002252 - N/A - Regression: incorrect result returned when recognizing BAR_C39_NSS (Code39 without Start/Stop character) barcode.

2001427 - N/A - Digit '1' characters standing alone in a table cells were not recognized when LZ_FREEFORM was specified.

2001130 - N/A - Code128 barcodes were not recognized (file-specific).

1998062 - N/A - Accuracy: zero digit ('0') was recognized as Uppercase O ('O') in invoice documents.

1997513 - N/A - The BAR engine failed to recognize noisy (slightly injured) Datamatrices.

1988525 - N/A - Different results returned from recognizing Code39 barcode depending on the setting Kernel.Ocr.BAR.Bar1D.MinLength.

1988316 - N/A - Only one barcode of the 2 on the page was recognized (file-specific).

1986424 - N/A - Intermittent issues occurred with 2D barcode recognition.

1986055 - N/A - The CSDK setting Kernel.Imf.PDF.Linearize had no effect when the output PDF should be of PDF/A.

1985550 - N/A - Code128 barcodes were not recognized (file-specific).

1982323 - N/A - FormTyper converted fields to a Comb of 6 characters instead of 3.

NOTE for fix pack 22.0.0.6 - FIXPACK6

For 2038358 and 2030520: Starting with this fix pack, new CSDK settings have been introduced for controlling the secondary 2D bar code module, called the Atala engine:
"Kernel.Ocr.Bar_Atl.Timeout"=30 // Measured in seconds; timeout allowed for the Atala 2D bar code engine "Kernel.Ocr.Bar_Atl.TradeOff"=2 // Atala's tradeoff settings. Possible values: 1 (MostAccurate); 2 (Fastest); 0 (Balanced) "Kernel.Ocr.Bar_Atl.ScanInterval"=5 // Atala's ScanInterval settings.
The values provided above are the default settings of CSDK.

22.0.0.5 - FIXPACK5

1979154 - N/A - GPF occured in the BAR module (PDF417) with a PDF (file-specific).

1978552 - N/A - Rechnung number was misrecognized (Digit '4').

1977750 - N/A - CSDK was unable to extract Patch Code.

1977320 - N/A - Specific table types were typically incorrectly recognized by the FormTyper.

1976203 - N/A - Accuracy: Colon (':') character was recognized as bullet.

1976058 - 26720348 - RecPDF API could not set the User Password (aka Open Password) for PDF files.

1975987 - N/A - Accuracy: Slash ('/') characters in date were misrecognized.

1968095 - N/A - PDF to DOCX conversion replaced some apostrophe characters with space (file-specific).

1968093 - N/A - PDF to PPTX conversion merged all values in the rows of a table (file-specific).

1966792 - N/A - 2D Datamatrices on the page were not recognized when the barcode type BAR_UCC128 is enabled for recognition.

1965706 - 26708767 - The formatted text converter ("Converters.Text.FormattedTxt") omitted the text content from higher resolution pages of a TIFF file from the OCR output.

1962499 - N/A - Invalid characters were captured from the barcode.

1956829 - N/A - Running RecConvert2Doc on a file in the root folder failed and corrupted the engine state

1956391 - N/A - C128 barcode was not recognized (file-specific).

1956277 - N/A - Table cell data was missing in the recognized text.

1956142 - N/A - Underscore characters in invoice numbers were not recognized.

1956137 - N/A - The RM_BAR recognition module misrecognized certain PDF417 barcodes.

1956136 - N/A - CSDK handled the codepage improperly in the Atalasoft PDF417 output.

1956134 - N/A - CSDK misrecognized the order number in PO PDF documents (character 6 was recognized as 'C').

1956118 - N/A - Accuracy: uppercase O ('O') instead of zero ('0') misrecognition occurred in invoice numbers.

1956113 - N/A - CSDK did not recognize the Patch Code in images with a black frame from the scan.

1956097 - N/A - API_GPFAULT_ERR occured during recognition if the PDF417 barcode type (BAR_PDF417) was enabled.

1956093 - N/A - API_HARDTIMEOUT_ERR occurred during recognition if DataMatrix barcode type (BAR_DMATRIX) was enabled.

1945281 - N/A - OPLA - incorrect usage data displayed in case of month-by-month aggregation.

1940759 - N/A - Negative numbers became positive when converting a specific PDF to Excel.

1938079 - N/A - Images with vertical Code39 barcodes were not recognized.

1938074 - N/A - CSDK did not recognize Code39 barcode from 200 dpi TIFF images while successfully detecting the same in the original PDF.

1933889 - N/A - Image compression error (IMF_COMP_ERR) occurred while loading a PDF file (file-specific).

1933879 - N/A - The line containing the only string "C/O" was missing in the recognized output (file-specific).

1933876 - N/A - A PDF caused GPF during image load (file-specific).

1933874 - N/A - A PDF file did not load (file-specific).

1933835 - N/A - PDF files were corrupted after kRecMakePagesSearchable (file-specific).

1931359 - 26656032 - [LCP-26736] All text was converted as pictures in the document (file-specific).

1928710 - N/A - ENH.REQ.: Support for loading 48-bit PNG images added.

1927762 - N/A - Image load time did increase with each subsequent page of multipage JPEG-compressed TIFF image files.

1859330 - 26589157 - Access violation occurred in RECDIAG.dll with LZ_FREEFORM.

1857559 - 26598271 - The PDF/A-2b output that CSDK produces failed on Adobe's Preflight check (file-specific),

1849672 - N/A - Converting PDF via the Power PDF menu "Making PDF Searchable" with the "Keep original images" option disabled resulted in a lot of underlined text.

1804140 - N/A - PLUS3W returned the string "INVOICE NUMBER" in the wrong word order.

NOTES for fix pack 22.0.0.5 - FIXPACK5

1.) The following file has been added to the CSDK 22.0 distribution file set with the fix pack FIXPACK5:

2.) For 1976058: To make a PDF protected with a User Password (aka Open Password) via the RecPDF API, use the command-string "OpenPassword". See sample code below:

22.0.0.3 - FIXPACK3

1893441 - N/A - Starting from the second page the table elements are placed into "text boxes".

1888435 - N/A - Most of text from the input PDF is missing from the output.

1883865 - 26526415 - Accuracy: Uppercase letter "D" of a specific font is recognized as "0".

1870862 - N/A - kRecPreprocessImg() runs into an infinite loop while processing particular Arabic documents.

1867089 - N/A - Loading certain PDF file results in a partially blank page (regression from CSDK 21.1.0.7).

1866746 - N/A - The generated image has missing characters and black blocks.

1864701 - N/A - Loading a particular PDF file results in partially empty image content.

1864665 - N/A - Areas with inverse header on the input PDFs are not extracted (page parse).

1863592 - N/A - Loading a particular PDF file results in an IMF_COMP_ERR error.

1859193 - N/A - Issues with DataMatrix recognition.

1855827 - N/A - Recognizing particular Arabic documents returns with API_ERROR_ERR.

1855595 - N/A - Particular image causes GPF in the kRecLocateZones function.

1848944 - N/A - Particular PDF is loaded as an almost empty page.

1847782 - N/A - Accuracy: Misrecognition of the word "Tyrk" on a Danish invoice.

1846603 - 26592864 - Arabic Decimal Separator character (U+066B) misrecognized as Arabic-Indic Digit One character (U+06F1).

1846110 - N/A - Handwritten signature is missing when loading PDFs (regression from CSDK 22.0.0.1).

1844060 - N/A - Number in a particular Japanese document is split into 3 parts in DTXT-XML output.

1843411 - 26534218 - Empty PDF is causing deadlock.

1843214 - N/A - Diagonal text is compacted to a single spot (regression from CSDK 22.0.0)

1840047 - N/A - CSDK adds multiple empty boxes to the PPT output that contains "Click to add text" (regression from CSDK 22.0.0.2).

1839227 - N/A - Loading a particular PDF page takes several minutes.

1839224 - N/A - An IMF_COMP_ERR error occurs while loading a particular PDF file.

1826829 - N/A - An API_TIMEOUT_ERR occurred during barcode recognition ("Kernel.Decomp.FindBarZones" = TRUE).

22.0.0.2 - FIXPACK2

1836485 - N/A - GPF occurred during recognizing particular Arabic document.

1836221 - N/A - Content missing while loading a particular PDF file.

1827213 - N/A - Invoices generated an almost blank page when processing a certain PDF.

1825618 - N/A - CSDK could not recognize certain CHT HKSCS (Hong Kong) characters.

1825095 - N/A - Issue with a PDF where all characters were converted to squares.

1823801 - N/A - Image part of a particular PDF was ignored.

1823796 - N/A - CSDK did not recognize Thai month names.

1823782 - N/A - Data in columns merged and did not align properly after converting a PDF file to Excel.

1822900 - N/A - Java API call kRecSetLicense() did not work.

1821246 - N/A - Experienced 100% CPU usage until server restart when processing a particular document with Datamatrix.

1820938 - N/A - Characters were missing from the OCR result.

1819876 - N/A - Barcode Datamatrix was not recognized.

1819627 - N/A - CSDK stopped responding while loading a particular PDF file (file1).

1819626 - N/A - CSDK stopped responding while loading a particular PDF file (file2).

1819624 - N/A - Particular PDF pages loaded partially.

1819623 - N/A - Text was missing after loading a particular PDF file.

1816581 - N/A - IMF_COMP_ERR error occurred with a particular PDF file.

1810672 - N/A - When a specific PDF was converted into PPTX, some images were missing and other conversion issues /1.

1805328 - N/A - FTE could not match the template with the same image.

1793445 - 26511677 - The CSDK function kRecCreateFormTemplate() failed to properly create form template from AcroForm PDF with the 'NoExport' field flag.

1785784 - N/A - FormTyper did not recognize all form fields within a PDF file.

1779163 - N/A - CSDK did not support image (PNG files) with high bits per pixel settings.

NOTE for fix pack 22.0.0.2 - FIXPACK2

For 1793445: Use the following code in your application to decide whether the type of a given PDF file is AcroForm (Acrobat Form) or not:
HSETTING metaData; RECERR rc = kRecGetImgMetaData(hPage, &metaData); if (NULL != metaData) { HSETTING sh; INTBOOL hs; rc = kRecSettingGetHandle(metaData, "PDF.HasAcroFormField", &sh, &hs); if (REC_OK == rc && NULL != sh && hs) { INTBOOL hasAcroFormField = FALSE; rc = kRecSettingGetInt(mySID, sh, &hasAcroFormField); if (REC_OK == rc) { // Value of hasAcroFormField: // FALSE (0): there is no AcroForm-field defined in the PDF // TRUE (1): there is an AcroForm-field defined in the PDF } } }

22.0.0.1 - FIXPACK1

1813100 - N/A - PDF that triggers IMF_COMP_ERR caused other errors.

1811463 - N/A - Text in a filled out PDF form was not converted.

1808344 - N/A - The RECAPI functions RecInsertPage, RecCreateDoc, and RecCloseDoc did not seem to be thread-safe.

1808273 - N/A - German PDF OCR resulted mixed uppercase and lowercase letters.

1807805 - N/A - 1-char-long C128 barcode was not recognized.

1806740 - N/A - LETTER::info did not appear in the DTXT XML output.

1806059 - N/A - Certain information in Polish PDF files did not display.

1805399 - N/A - Text disapeared when PDF was loaded.

1804584 - N/A - Arabic and some other languages were missing from the list of recognition languages in the Workflow Assistant (AssistantApp.exe).

1803737 - N/A - iTest 64-bit application crashed when trying to view the list of modules.

1803599 - N/A - Hard timeout error (API_HARDTIMEOUT_ERR) occured while searching for QR-code on the image.

1803174 - N/A - kRecPreprocessImg() returned IMG_SIZE_ERR while processing a certain PDF.

1801461 - 26516292 - CSDK crashed on a particular Arabic document (API_GPFAULT_ERR).

1796714 - N/A - Recognition with multi-language OCR failed with CHR_ILLEGALSETTING_ERR - Bronze2 or Bronze3.

1796321 - N/A - PDF's invisible text appears in the recognition result (white color text on white background).

1794931 - N/A - Certain Matrix2of5 barcodes were not recognized.

1792599 - N/A - First page was rotated with 180 degrees during recognition.

1791665 - N/A - A particular PDF file did not load.

1790453 - N/A - System.AccessViolationException occured when calling kRecDetectImgSkew().

1787991 - 26495548 - CSDK hanged intermittently (API_HARDTIMEOUT_ERR).

1787700 - N/A - kRecLocateZones() returned with IMG_RECT_ERR when adding FM_AUTO_HAND zone near the edge of the image.

1776217 - N/A - PDF conversion to Word and Excel degraded in quality at specific pages.

1716843 - 26347098 - [LCP-20866] Strikethrough text converted as text with hyphens.

1715598 - 26436573 - kRecLoadImg() crashed on a specific PDF.

1626341 - 26309448 - Processing failed on a specific 146-page Arabic PDF.

1610362 - 26296054 - PDF form entries in certain PDF documents were ignored.

NOTE for fix pack 22.0.0.1 - FIXPACK1

For 1807805: Beside the fix, the application should set the following CSDK setting:
"Kernel.Ocr.BAR.bar1D.MinLength"=1

Applies to

You can apply this fix pack to update the following Kofax OmniPage CSDK for Windows version:

Install this fix pack

Use the following procedure to Install this fix pack.

  1. Verify that the following applications and services are not running:
  2. Backup the content of the Bin folder of your Capture SDK 22.0 installation (to the backup-copy folder)
  3. Depending on your existing Capture SDK installation, select either the file KofaxOmniPageCaptureSDK-22.0.0.8_forWindows_32-bit.zip or KofaxOmniPageCaptureSDK-22.0.0.8_forWindows_64-bit.zip.
    Unzip the fix pack file to a temporary location.
  4. Copy the files and subfolders located in the fix pack's folder (CSDK_BIN32 or CSDK_BIN64) into the Bin folder of your current Omnipage Capture SDK installation.
    IMPORTANT:
    • Remember to copy all files with the subfolders! (recursive copy)
    • This fix pack contains 4 new files (SxS*.manifest), which were not part of the fileset of the original 22.0 release. Note that these are required in case of IWR-based OCR engine integration only.
    • The following files became unnecessary and can be omitted from the distribution fileset supposing that the integrating application is not an IWR-based one:
      • Kofax.OmniPageCSDK.ArgTypes.dll
      • Kofax.OmniPageCSDK.CAPI.dll
      • Kofax.OmniPageCSDK.Objects.dll
      • Kofax.OmniPageCSDK.Objects.xml
      • Kofax.OmniPageCSDK.RecPDF.dll
      • Kofax.OmniPageCSDK.RecPDF.xml
  5. If you use the standalone version of Document Classifier and there is an Engine subfolder under the folder DocumentClassifier, refresh the files in the Engine subfolder from the CSDK_BIN32 or CSDK_BIN64 folder of this fix pack.
  6. Similarly, if you use the standalone version of Form Template Editor and there is an Engine subfolder under the folder FormTemplateEditor20, refresh the files in the Engine subfolder from the folder CSDK_BIN32 or CSDK_BIN64 of this fix pack.
  7. If you have runtime deployment, remember to update that as well:
    In the Distribution File Set (generated earlier), replace the original files with the new ones from the fix pack.
    Remember to add new CSDK files introduced since the version 22.0 release, if there are any (that is, in any of the fix packs).
    Note: As the best practice, after applying the updates to the developer installation, generate the Distribution File Set again, using the Distribution Wizard.
  8. If you are using OmniPage Licensing Agent (OPLA), use its updated version from the OPLA subfolder of this fix pack.
    The OPLA subfolder contains all the files necessary for OPLA.
  9. Restart any applications and services you stopped before installing the fix pack.

Remove this fix pack

Use the following procedure to remove this fix pack.

  1. Verify that the following applications and services are not running:
  2. Copy the files from the backup-copy folder to the Bin folder of your OmniPage Capture SDK 22.0 installation.
  3. Restart any applications and services that were stopped prior to removing the fix pack.

Files included

This fix pack includes a vast number of files. This document does not detail file names and versions.