Advanced OCR Output Format Window - HTML
Use this window to control the output format for the HTML file generated by the Kofax Advanced OCR Full Text recognition engine.
Output format
Changing the output format may make other options available. The settings of disabled options are retained, so that if you return to that format, the most recent settings are still used.
You can select an output format from this list:
-
Plain Text (.txt)
-
Rich Text Format (.rtf)
-
HTML (.mht)
-
Microsoft Word (*.doc)
-
Comma-Separated Values (*.csv)
-
Microsoft Excel (*.xls)
-
Microsoft Word 2007 and later (*.docx)
-
Microsoft Excel 2007 and later (*.xlsx)
Page layout
Select the page layout characteristics for exporting to the output format. You can select from the following:
-
Full-Page Layout: Document layout is retained in full.
-
Retain Paragraphs and Fonts: Recognized text is formatted into a single column. Paragraph, fonts and font size are retained.
-
Retain Paragraphs Only: Recognized text is formatted into a single column. Frames are not used. Paragraphs are retained; however, font types and sizes are not retained.
Text Settings
Select the text attributes (Bold, Italic, Underline) to retain when the recognized data is saved to the output file. For example, to retain characters that are bold in the original document, select the Bold setting. Unselected text attributes are ignored.
Suppress line breaks
Select this check box if you want line breaks in the original document to be suppressed (discarded) when the recognized data is saved. If not, the line breaks are retained.
Use page break as page separator
Select this check box when you want page breaks in the original document to be used as page separators when the recognized data is saved. If not, the page breaks are ignored.
Retain text color
Select this check box if you want the color of the text in the original document to be retained when the recognized data is saved. If not, the original color is ignored.
Picture Settings
Use these settings to manage pictures in the output file.
Remove pictures
Select this check box if you want any pictures that belong to a page removed from the output file.
Resolution
Specify the original resolution of the images to be used. You can choose from among the following output resolutions in dots per inch:
-
72
-
96
-
120
-
200
-
240
-
300
-
360
-
400
-
600
Resolution can only be reduced, not increased. For example, if the original image resolution of the scanned page is 200 dpi and the resolution combo box is set to 300, the image resolution on the output file is 200 dpi and not 300 dpi.
JPEG Quality
Set the JPEG quality for color pictures saved in the output file. The quality range is from 1% to 100%, with 100% being the best quality.