Image file formats and image quality
Sample images used in a project to create classification and extraction definitions and patterns are reference documents for the later scanned documents. Therefore the resolution of these sample images should be the same as for the scanned (or later imported) documents. Faxes are usually scanned by the sender at 200 DPI resolution, whereas for other kind of documents 300 DPI resolution provides the best image quality.
Good OCR recognition also requires the best possible quality in the compressed image, because automatic data capture is very sensitive to the image quality. To be able to process scanned documents in a Tungsten Capture environment with the Tungsten Transformation, you must use sample image files in the same compression format as the Tungsten Capture Scan module provides. The Tungsten Capture Scan module usually creates image files in a TIFF format (file extension ".tif".)
Tungsten Capture Scan module supports the most bitonal, 256-level grayscale and RGB full color TIFF formats and subformats, including JPEG and Fax G3 and Fax G4 compression. G3 compression is the universal standard for fax and multi-page line art documents. Black and white (bitonal) is commonly called line art mode. G3(G4) is also called CCITT Group 3(4) compression.
Multi-page TIFF files must be split into multiple single-page files prior to be used in Tungsten Transformation. This is done automatically by the Tungsten Capture Scan module.
For more information about using Tungsten Capture Scan module, refer to your Tungsten Capture documentation.