RecAPI
All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Properties Modules Pages
Layout Retention Output Module

Layout Retention Output. RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, MacOS. More...

Classes

struct  _OUTPUTCONVERTERINFO
 Output document converter information (ANSI) More...
 
struct  OUTPUTCONVERTERINFOW
 Output document converter information (Unicode) More...
 

Macros

#define OUTPUTCONVERTERINFO   WORA(OUTPUTCONVERTERINFO)
 Output document converter information.
 

Typedefs

typedef struct _OUTPUTCONVERTERINFO OUTPUTCONVERTERINFOA
 Output document converter information (ANSI)
 

Enumerations

enum  DocFormatter_Mode {
  DFM_Lite = 0x00 ,
  DFM_CharacterStyleConsolidation = 0x01 ,
  DFM_ParagraphStyleConsolidation = 0x02 ,
  DFM_StyleConsolidation = 0x03 ,
  DFM_HeaderFooter = 0x04 ,
  DFM_CrossrefFind = 0x08 ,
  DFM_MarginConsolidation = 0x10 ,
  DFM_HeadingConsolidation = 0x20 ,
  DFM_Essay = 0x2F ,
  DFM_Full = 0xFF
}
 Document formatter methods. More...
 
enum  TColorQValues
 Color quality. More...
 
enum  TPDFCompatibTypeValues
 Compatibility. More...
 
enum  TPDFOpenDisplayModeValues {
}
 Display mode. More...
 
enum  TPDFOpenPGLayoutValues {
}
 Page layer. More...
 
enum  R2_HEADERS_RETENTION
 HeadersFooters. More...
 
enum  TWriteIndex
 Index Page. More...
 
enum  TPDFSecurityValues
 PDFSecurity type. More...
 
enum  R2_PAGEBREAKS
 PageBreaks. More...
 
enum  R2_PICTURES_BPP
 Picture color. More...
 
enum  R2_PICTURES_DPI
 Pictures. More...
 
enum  TSignatureTypevalues
 Signature type. More...
 
enum  R2_TABLES_RETENTION
 Tables. More...
 
enum  TMRCTypeValues
 MRC use. More...
 
enum  OUTPUTLEVEL {
  OL_AUTO ,
  OL_NOFORMAT ,
  OL_RFP ,
  OL_TRUEPAGE ,
  OL_FLOWINGPAGE ,
  OL_SPREADSHEET
}
 Output level of the exported document. More...
 

Functions

RECERR RECAPIPLS RecSetOutputFormat (int sid, LPCTSTR pFormatname)
 Set the output format.
 
RECERR RECAPIPLS RecGetOutputFormat (int sid, LPTSTR pFormatname, int len)
 Ask the output format.
 
RECERR RECAPIPLS RecGetFirstOutputFormat (LPTSTR pFormatname, int len)
 Start the enumeration of the output formats.
 
RECERR RECAPIPLS RecGetNextOutputFormat (LPTSTR pFormatname, int len)
 Continue the enumeration of the output formats.
 
RECERR RECAPIPLS RecGetOutputFormatInfo (LPCTSTR pFormatName, OUTPUTCONVERTERINFO *pInfo)
 Get information about the specified output document format converter.
 
RECERR RECAPIPLS RecGetOutputSettingsHandle (int sid, HSETTING *hSetting)
 Gets the settings handle for the currently set output format.
 
RECERR RECAPIPLS RecSetOutputLevel (int sid, OUTPUTLEVEL outLevel)
 Set the level of format retention for the final output document.
 
RECERR RECAPIPLS RecGetOutputLevel (int sid, OUTPUTLEVEL *poutLevel)
 Ask the current level of format retention for the final output document.
 

Detailed Description

Layout Retention Output. RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, MacOS.

RecAPIPlus provides complex accurate layout retention outputs with several file formats such as RTF, DOC, WordML, XLS, PDF, WP, WAV. The RecConvert2Doc and RecProcessPagesEx functions export the given document into the previously mentioned output file formats. See the details about support of different output formats on different platforms.

In several cases our goal is to retain the original layout in the output document, as far as possible. The different converters have different capabilities for retaining the layout. There are 5 output levels (OUTPUTLEVEL) for the several layout retentions. Not every converter can realize every output mode. For example a Word document or a PDF document has Flowing Page and True Page modes, which are very similar to the original output and there are simple text converters, which can retain only the simple text in Plain Text mode (formerly No Format mode) and the text with its attributes in Formatted Text mode (formerly Retain Font and Paragraphs mode).

Besides the output modes, converters have many settings, which can influence the layout (list of converter settings). You can use these settings through the Settings Manager Module.

Page consolidation

When scanning pages from a document with uniform margins, typically the page images do not place the body text content in precisely the same position on each page, due to scanning variations. Previously, users had to manually restore uniform margins after the recognition result was exported. This toolkit examines incoming pages and if it determines that they have similar text area and layout, page consolidation is automatically performed. The program calculates ideal margins, then identifies a vector for each page describing the difference between the actual and ideal margins. These vectors are then applied during the output process to the following file types: RTF, WordML, PDF, DOCX and XPS. This consolidation is totally automatic and cannot be influenced. However, the User can decide whether the converter should apply these vectors or not, by using the setting ConsolidatePages of the given converters.

Macro Definition Documentation

◆ OUTPUTCONVERTERINFO

#define OUTPUTCONVERTERINFO   WORA(OUTPUTCONVERTERINFO)

Output document converter information.

On Windows this type can be used as OUTPUTCONVERTERINFOA or OUTPUTCONVERTERINFOW depending on _UNICODE macro. On Linux and MacOS this is equivalent to OUTPUTCONVERTERINFOA.

Typedef Documentation

◆ OUTPUTCONVERTERINFOA

Output document converter information (ANSI)

This structure describes the converter module and its target format. This is used by the RecGetOutputFormatInfo function.

Enumeration Type Documentation

◆ DocFormatter_Mode

Document formatter methods.

These are the possible values of the setting Formatter.df.mode.

Enumerator
DFM_Lite 

Add page to document.

DFM_CharacterStyleConsolidation 

Find/Consolidate Character styles.

DFM_ParagraphStyleConsolidation 

Find/Consolidate Paragraph styles.

DFM_StyleConsolidation 

Find/Consolidate Paragraph and character styles too.

DFM_HeaderFooter 

Find header/footers. It requires to have at least 2 pages in a document.

DFM_CrossrefFind 

Find footnotes.

DFM_MarginConsolidation 

Margin consolidation.

DFM_HeadingConsolidation 

Find/Consolidate Headings (using special styles)

DFM_Essay 

Style and headings consolidation, find header/footers, footnotes/hrefs.

DFM_Full 

All.

◆ OUTPUTLEVEL

Output level of the exported document.

Pre-defined levels of the format retention for the final output document. The different property values belonging to these settings are documented in the RecSetOutputLevel function. See also the table of the supported output levels by each converters.

Enumerator
OL_AUTO 

Converter default

OL_NOFORMAT 

Plain text (formerly No formatting mode).

OL_RFP 

Formatted Text (formerly Retain Font and Paragraphs mode).

OL_TRUEPAGE 

True Page.

OL_FLOWINGPAGE 

Flowing Page.

OL_SPREADSHEET 

Spreadsheet.

◆ R2_HEADERS_RETENTION

HeadersFooters.

You can set how headers and footers should be handled. You can set it for every converter, but the default value is different. For more information, see the setting HeadersFooters in the summary table of converter settings.

◆ R2_PAGEBREAKS

PageBreaks.

For several converters you can set how you want page breaks to be handled. For more information, see the setting PageBreaks in the summary table of converter settings.

◆ R2_PICTURES_BPP

Picture color.

For several converters, you can set the color of the image. For more information, see the setting PictureColor in the summary table of converter settings.

◆ R2_PICTURES_DPI

Pictures.

For every converter you can set how you would like to handle images. The default values are different for the different converters. For more information, see the setting Pictures in the summary table of converter settings.

◆ R2_TABLES_RETENTION

Tables.

For every converter, except the Excel and Html converters, you can set how you would like to handle tables. For more information, see the setting Tables in the summary table of converter settings.

◆ TColorQValues

Color quality.

For the PDF converters you can set the color quality. The default is R2ID_PDFCOLORQUALITY_MIN for every PDF converter. For more information, see the setting ColorQuality in the summary table of converter settings.

◆ TMRCTypeValues

MRC use.

For PDF converters you can set the MRC type. The default is: R2ID_PDFMRC_NO for every PDF converter. For more information, see the setting UseMRC in the summary table of converter settings. The newer ones can be used with 5 different predefined levels (1-5) by calling kRecSetCompressionLevel. See the Tungsten Omnipage Capture SDK User's Guide for more details, in the Imaging Module, MRC image compression level comparison subsection. About MRC Level in Saving MRC PDF files in KernelAPI Saving MRC PDF files in KernelAPI.

◆ TPDFCompatibTypeValues

Compatibility.

For the PDF converters you can set this compatibility value. For more information, see the setting Compatibility in the summary table of converter settings.

◆ TPDFOpenDisplayModeValues

Display mode.

For any PDF converters you can set this mode values specifying how the pdf file should be displayed when opened.

Enumerator
R2ID_PDF_ODM_NONE 

Neither document outline nor thumbnail images visible (def in pdf)

R2ID_PDF_ODM_OUTLINE 

Document outline visible

R2ID_PDF_ODM_THUMBNAIL 

Thumbnail images visible

R2ID_PDF_ODM_FULL_SCREEN 

The PDF file opens in full screen mode

R2ID_PDF_ODM_VISIBLE_OC 

PDF 1.5 //Optional content group panel visible

R2ID_PDF_ODM_ATTACHMENTS 

PDF 1.6 //Attachments panel visible

◆ TPDFOpenPGLayoutValues

Page layer.

For any PDF converters you can set this values specifying the pdf page layout when opened.

Enumerator
R2ID_PDF_OPGL_SINGLEPAGE 

Display one page at a time (def in pdf)

R2ID_PDF_OPGL_ONECOLUMN 

Display the pages in one column

R2ID_PDF_OPGL_TWOCOLUMNLEFT 

Display the pages in two columns, with odd-numbered pages on the left

R2ID_PDF_OPGL_TWOCOLUMNRIGHT 

Display the pages in two columns, with odd-numbered pages on the right

R2ID_PDF_OPGL_TWOPAGELEFT 

PDF 1.5 //Display the pages two at a time, with odd-numbered pages on the left

R2ID_PDF_OPGL_TWOPAGERIGHT 

PDF 1.5 //Display the pages two at a time, with odd-numbered pages on the right

◆ TPDFSecurityValues

PDFSecurity type.

For PDF converters you can set the security type. For more information, see the setting PDFSecurity.Type in the summary table of converter settings.

◆ TSignatureTypevalues

Signature type.

For the PDF converters you can set the signature type. The default is: R2ID_SIGTYPENONE for every PDF converter. For more information, see the setting Signature.SignatureType in the summary table of converter settings.

◆ TWriteIndex

Index Page.

You can switch on the Index Page generation in simple or 'InFrame' mode using HTML output converters. If it is switched on, an index page is generated with links to the recognized and converted pages. In this case, you can change the text of the navigation links by changing NavNextText, NavPrevText or NavTOCText. For more information, see the setting IndexPage in the summary table of converter settings.

Function Documentation

◆ RecGetFirstOutputFormat()

RECERR RECAPIPLS RecGetFirstOutputFormat ( LPTSTR pFormatname,
int len )

Start the enumeration of the output formats.

This starts the enumeration of the document output formats in the current thread.

Parameters
[out]pFormatnameBuffer containing the converter name.
[in]lenLength of the buffer.
Return values
RECERR
Note
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, MacOS. However not all the output formats are supported on all these platforms. See details.
The specification of this function in C# is:
RECERR RecGetFirstOutputFormat(StringBuilder formatName);
// or
RECERR RecGetFirstOutputFormat(out string formatName);
RECERR
Error codes.
Definition RECERR_doc.h:19
RECERR RECAPIPLS RecGetFirstOutputFormat(LPTSTR pFormatname, int len)
Start the enumeration of the output formats.
There is a non-enumerating function replacing RecGetFirstOutputFormat and RecGetNextOutputFormat in C#:
RECERR RecGetAllOutputFormats(out string[] formatnames);
The specification of this function in Java is:
int RecGetFirstOutputFormat(String[] pFormatname)
The specification of this function in Python is:
def RecGetFirstOutputFormat() -> Tuple[int, str]

◆ RecGetNextOutputFormat()

RECERR RECAPIPLS RecGetNextOutputFormat ( LPTSTR pFormatname,
int len )

Continue the enumeration of the output formats.

This continues the enumeration of the document output formats in the current thread.

Parameters
[out]pFormatnameBuffer containing the converter name.
[in]lenLength of the buffer.
Return values
RECERR
Note
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, MacOS. However not all the output formats are supported on all these platforms. See details.
The specification of this function in C# is:
RECERR RecGetNextOutputFormat(StringBuilder formatName);
// or
RECERR RecGetNextOutputFormat(out string formatName);
RECERR RECAPIPLS RecGetNextOutputFormat(LPTSTR pFormatname, int len)
Continue the enumeration of the output formats.
There is a non-enumerating function replacing RecGetFirstOutputFormat and RecGetNextOutputFormat in C#:
RECERR RecGetAllOutputFormats(out string[] formatnames);
The specification of this function in Java is:
int RecGetNextOutputFormat(String[] pFormatname)
The specification of this function in Python is:
def RecGetNextOutputFormat() -> Tuple[int, str]

◆ RecGetOutputFormat()

RECERR RECAPIPLS RecGetOutputFormat ( int sid,
LPTSTR pFormatname,
int len )

Ask the output format.

This asks the output document format for the RecConvert2Doc, RecProcessPagesEx functions.

Parameters
[in]sidSettings Collection ID.
[out]pFormatnameBuffer containing the converter name.
[in]lenLength of the buffer.
Return values
RECERR
Note
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, MacOS. However not all the output formats are supported on all these platforms. See details.
This function gets the value of the setting APIPlus.Output.TextFormat. This setting can be changed by RecSetOutputFormat.
The specification of this function in C# is:
RECERR RecGetOutputFormat(int sid, StringBuilder formatName);
// or
RECERR RecGetOutputFormat(int sid, out string formatName);
RECERR RECAPIPLS RecGetOutputFormat(int sid, LPTSTR pFormatname, int len)
Ask the output format.
The specification of this function in Java is:
int RecGetOutputFormat(int sid, String[] pFormatname)
The specification of this function in Python is:
def RecGetOutputFormat(sid: int) -> Tuple[int, str]

◆ RecGetOutputFormatInfo()

RECERR RECAPIPLS RecGetOutputFormatInfo ( LPCTSTR pFormatName,
OUTPUTCONVERTERINFO * pInfo )

Get information about the specified output document format converter.

Parameters
[in]pFormatNameThe name of the output conversion format.
[out]pInfoPointer to an OUTPUTCONVERTERINFO variable.
Return values
RECERR
Note
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, MacOS. However not all the output formats are supported on all these platforms. See details.
The specification of this function in C# is:
RECERR RecGetOutputFormatInfo(string formatName, out OUTPUTCONVERTERINFO pInfo);
RECERR RECAPIPLS RecGetOutputFormatInfo(LPCTSTR pFormatName, OUTPUTCONVERTERINFO *pInfo)
Get information about the specified output document format converter.
#define OUTPUTCONVERTERINFO
Output document converter information.
Definition RecAPIPlus.h:1088
The specification of this function in Java is:
int RecGetOutputFormatInfo(String pFormatName, OUTPUTCONVERTERINFO pInfo)
The specification of this function in Python is:
def RecGetOutputFormatInfo(pFormatName: str, pInfo: "OUTPUTCONVERTERINFO") -> int

◆ RecGetOutputLevel()

RECERR RECAPIPLS RecGetOutputLevel ( int sid,
OUTPUTLEVEL * poutLevel )

Ask the current level of format retention for the final output document.

Parameters
[in]sidSettings Collection ID.
[out]poutLevelPointer to output level variable.
Return values
RECERR
Note
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, MacOS. However not all the output formats are supported on all these platforms. See details.
This function gets the value of the setting APIPlus.Output.OutputLevel. This setting can be modified by RecSetOutputLevel.
The specification of this function in C# is:
RECERR RecGetOutputLevel(int sid, out OUTPUTLEVEL outLevel);
OUTPUTLEVEL
Output level of the exported document.
Definition RecAPIPlus.h:1125
RECERR RECAPIPLS RecGetOutputLevel(int sid, OUTPUTLEVEL *poutLevel)
Ask the current level of format retention for the final output document.
The specification of this function in Java is:
int RecGetOutputLevel(int sid, OUTPUTLEVEL[] poutLevel)
The specification of this function in Python is:
def RecGetOutputLevel(sid: int) -> Tuple[int, int]

◆ RecGetOutputSettingsHandle()

RECERR RECAPIPLS RecGetOutputSettingsHandle ( int sid,
HSETTING * hSetting )

Gets the settings handle for the currently set output format.

Parameters
[in]sidSettings Collection ID.
[out]hSettingPointer to the setting handle
Return values
RECERR
Note
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, MacOS. However not all the output formats are supported on all these platforms. See details.
For example for the "RTF2000" converter, this function returns the handle for the "Converters.Text.RTF2000" setting.
The specification of this function in C# is:
RECERR RecGetOutputSettingsHandle(int sid, out IntPtr hSetting);
RECERR RECAPIPLS RecGetOutputSettingsHandle(int sid, HSETTING *hSetting)
Gets the settings handle for the currently set output format.

◆ RecSetOutputFormat()

RECERR RECAPIPLS RecSetOutputFormat ( int sid,
LPCTSTR pFormatname )

Set the output format.

It sets the output document format for the RecConvert2Doc, RecProcessPagesEx functions.

Parameters
[in]sidSettings Collection ID.
[in]pFormatnameConverter name.
Return values
RECERR
Note
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, MacOS. However not all the output formats are supported on all these platforms. See details.
This function sets the value of the setting APIPlus.Output.TextFormat. This setting can be retrieved by RecGetOutputFormat.
See the list of the selectable output formats for more information. In addition see the connection between output formats and output levels, and the information about vertical text support of each converter.
The converter name must be the root of the given converter in the setting tree (e.g. Converters.Text.DocX). For more information see the list of converter settings.
The settings of a given converter are created when the converter is selected for the first time (RecSetOutputFormat). Thus before this action the mentioned settings cannot be accessed.
The specification of this function in C# is:
RECERR RecSetOutputFormat(int sid, string formatName);
RECERR RECAPIPLS RecSetOutputFormat(int sid, LPCTSTR pFormatname)
Set the output format.
The specification of this function in Java is:
int RecSetOutputFormat(int sid, String pFormatname)
The specification of this function in Python is:
def RecSetOutputFormat(sid: int, pFormatname: str) -> int

◆ RecSetOutputLevel()

RECERR RECAPIPLS RecSetOutputLevel ( int sid,
OUTPUTLEVEL outLevel )

Set the level of format retention for the final output document.

This function can simplify the specifying of output formatting details for the output document.

Parameters
[in]sidSettings Collection ID.
[in]outLevelThe output level.
Return values
RECERR
Note
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, MacOS. However not all the output formats are supported on all these platforms. See details.
This function sets the value of the setting APIPlus.Output.OutputLevel. This setting can be retrieved by RecGetOutputLevel. In addition, each converter has a setting OutputMode, of which value is used when the OutputLevel is OL_AUTO.
Output levels:
  • Plain Text (formerly No formatting mode) - All formatting information is ignored and replaced by a default value. (One column, left aligned paragraphs, no font attributes, a default font, etc.) Tables and graphics are retained and placed within the text. Highlights, strikeouts and marking for redaction are not transmitted, but redacted text is blacked-out.
  • Formatted Text (formerly Retain Font and Paragraphs mode) - The formatting information on fonts and paragraphs is retained, but the layout related information is ignored. Highlights and strikeouts and redactions are conserved. (This level has a special purpose when saving to Excel: each detected table or spreadsheet in a document is saved to a separate worksheet. Other content is placed on the last worksheet and functions as an index. The tables are replaced by hyperlinks to their own sheet.)
  • True Page - This keeps the look of the original layout of the pages. This is done by absolute positioning of the texts, pictures and tables on the page with boxes, frames or other target application specific methods. This level is only available for target applications capable of handling these. True Page level is the only choice for the XML converter and for all PDF converters except for ‘PDF Edited’. Limitation: in case of docx and rtf converters, drop caps are written out as normal text.
  • Flowing Page - Preserves the original layout of the pages, including retaining columns. Boxes and frames are only used when necessary. This level is only available with target applications that can handle columns.
  • Spreadsheet - This level exports the results in tabular form, suitable for use in spreadsheet applications. Each page is placed in a separate worksheet. This level is only available for the Excel and the HTML 3.2 formats.
In the Microsoft Word and PowerPoint programs, the size of the page is limited. The width and height of the page must be between 0.1 and 22 inches. Because of this, if you scanned or loaded a page that is larger than this limit, you cannot save it into Flowing Page or True Page formats with *.rtf and *.docx and *.pptx file extensions. These formats try to retain the original page size and layout. If you try to save this, you will get an error message and the file will not be saved.
See also the connection between output formats and output levels, and the information about vertical text support of each converter.
The specification of this function in C# is:
RECERR RecSetOutputLevel(int sid, OUTPUTLEVEL outLevel);
RECERR RECAPIPLS RecSetOutputLevel(int sid, OUTPUTLEVEL outLevel)
Set the level of format retention for the final output document.
The specification of this function in Java is:
int RecSetOutputLevel(int sid, OUTPUTLEVEL outLevel)
The specification of this function in Python is:
def RecSetOutputLevel(sid: int, outLevel: int) -> int