PDF Image + Text Output Format Window for Text Under Image
When the recognition profile is set to Kofax PDF Text Under Image with the PDF Image + Text engine, use this window to configure PDF output preferences.
Output format
Select the output format that is generated by the PDF Image + Text recognition engine and saved to an external file:
-
Kofax PDF: File format that lets you view a document on any computer system while preserving the layout.
-
Kofax PDF/A: File format based on PDF that supports the long-term preservation of digital documents. PDF/A files often have larger file sizes than PDF. This option also includes tags in the PDF document.
The availability of other settings on this window may vary, based on your output format selection.
Page Content
Select the structure of the PDF pages generated by the PDF Image + Text recognition engine:
-
Text Over Image: The recognized text is saved over the entire page image. The entire image is saved as a bitmap; however, text areas are saved as text (with full text search capability) over the bitmap. With this option, you can select and copy any of the text.
-
Text Under Image: The recognized text is saved under the entire page image. This is the default selection. The entire image is saved as a bitmap; however, text is placed beneath it. This option is useful if you export your text to document archives: the full page layout is retained and full text search is available. You can select and copy from the underlying text. This option tends to produce the largest output file among these three choices.
-
Text and Image: The recognized text is saved as text (with full text search capability) and images are saved as bitmaps. The original document design (font, background and layout marking) is not retained. This option tends to produce the smallest output file among these three choices.
Text Settings
Select the text attributes you want to be retained when the recognized data is saved to the output file. For example, if you want to retain characters that are bold in the original document, select the Bold option.
Replace uncertain words with images
Use this option to replace words the engine cannot recognize with small graphic snippets of the word that have been clipped from the original image file.
Note that the text attribute settings behave differently according to the Page content setting.
-
With Text Over Image, the text attributes are output as selected, and the output text retains its original color.
-
With Text Under Image, text attribute selections are ignored, and the output text is always black text on a white background.
-
With Text and Image, the text attributes are output as selected, and the output text is always black text on a white background.
Retain text color
Select this option if you want the color of the text in the original document to be retained when the recognized data is saved. If not, the original color is ignored.
Embed recognized text fonts
The
Embed recognized text fonts
option is not available.
Resolution
Set the resolution of the images for the PDF document being saved as an output file. You can select from the following output resolutions in dots per inch:
-
72
-
96
-
120
- 200
-
240
-
300
-
360
-
400
-
600
Compression format
Use the list to select a compression format (CCITT4, JPEG, or JPEG 2000) for PDF output.
By default, the selection is JPEG, which is supported for color and grayscale images. If you are processing bitonal images while JPEG is selected, CCITT4 is used instead.
JPEG Quality
Use this to specify the JPEG quality for color pictures saved in the output file. The quality range is from 1% to 100%, with 100% being the best quality.
PDF Version
Select one of the following PDF versions to use for the output:
-
Auto
-
1.3
-
1.4
-
1.5
-
1.6
-
1.7
When you select Auto, the application automatically determines the PDF version number.
PDF/A Compliance
If Kofax PDF/A is selected as the output format, select the level of PDF/A compliance:
-
PDF/A-1a: (default setting): Supports long-term storage of digital documents and fully satisfies the requirements in the ISO 19005-1 specification. Readable by any PDF reader that conforms to PDF 1.4, or later.
-
PDF/A-1b : Offers the same level of compliance as PDF/A-1a and satisfies the minimal requirements in the ISO 19005-1 specification.
-
PDF/A-2a: Offers the same level of compliance as PDF/A-1a, and adds support for JPEG 2000 compression to generate reduced file sizes. Satisfies the requirements in the ISO 19005-2 specification. Readable by any PDF reader that conforms to PDF 1.7.
-
PDF/A-2u: Offers the same level of compliance as PDF/A-2a, and adds the ability to extract text in Unicode.
-
PDF/A-3a: Permits the embedding of another PDF/A file or a binary file format (such as XML or Microsoft Office) within the PDF/A file. Satisfies the requirements in the ISO 19005-3 specification. Otherwise, the same as PDF/A-2a.
Note With PDF/A-3a compliance, only the PDF/A file, rather than any embedded file, should be considered for archiving purposes. -
PDF/A-3u: Offers the same level of compliance as PDF/A-3a, and adds the ability to extract text in Unicode.
Add tags to document
Adds PDF tags to the PDF document to specify document structure and allow the extraction of page content. Tags are useful for reflowing text and graphics, conversion to HTML and XML file formats, and interpretation by assistive software for the visually impaired.
Image Compression
Select an image compression profile from the list.
Edit button
Modify an existing image compression profile or create a new one. The Image Compression Profiles window appears so you can specify the type of image compression to use.