RecAPI
All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Properties Modules Pages

The LETTER structure. More...

#include <KernelApi.h>

Public Attributes

WORD left
 
WORD top
 
WORD width
 
WORD height
 
float pointSize
 
WORD capHeight
 
WORD baseLine
 
WORD zone
 
WCHAR code
 
BYTE err
 
BYTE reserved_b
 
BYTE cntChoices
 
BYTE cntSuggestions
 
DWORD ndxChoices
 
WORD fontAttrib
 
WORD ndxFontFace
 
DWORD info
 
WORD makeup
 
BYTE widthULdot
 
BYTE widthULgap
 
WORD cellNum
 
BYTE ndxFGColor
 
BYTE ndxBGColor
 
short lang
 
short lang2
 
DWORD ndxExt
 

Detailed Description

The LETTER structure.

This is a recognized data structure. As the result of the recognition process the recognition data will consist of this type of structure for each recognized character. This is the most detailed information available about the recognized characters.

See the usage of alternatives and handling of spaces.

Note
The field pointSize is not a replacement for the field fontSize in the CSDK versions 12.x (that's the cause of the name change):
  • pointSize is filled only on textual PDF inputs. For others, it is used only internally during page formatting (only at RecApiPlus level).
  • capHeight is always available and could be used for replacing fontSize. Rough approximation: fontSize = capHeight * 100 / dpi (fontSize in CSDK 12.7 is calculated in this way).
The position and size information (left, top, width, height, capHeight, baseLine, widthULdot, widthULgap) is expressed in pixel coordinates mapped to the image specified when getting the letters.
The bounding box (left, top, width, height) of a character usually contains that single character only, but sometimes more characters are recognized together in one step, in which case all those characters have the same bounding box.

Member Data Documentation

◆ baseLine

WORD LETTER::baseLine

Y coordinate of the baseline in pixels. In case of vertical text this is an X coordinate. In CCJK vertical text the baseline is at the middle of the characters by definition. In case of barcodes this coordinate is independent of the direction of the barcode, it is always the Y coordinate of the bottom of the barcode.

◆ capHeight

WORD LETTER::capHeight

Expresses a measure of the capital letter height in pixels. See notes for more info!

◆ cellNum

WORD LETTER::cellNum

Index of the cell in the cell list which contains the character (applicable only for WT_TABLE zones). Index of the text-line form-element object which contains the character (in case of WT_FORM zones).

◆ cntChoices

BYTE LETTER::cntChoices

Number of related choices continuously placed in the external choice string.

◆ cntSuggestions

BYTE LETTER::cntSuggestions

Number of related suggestions continuously placed in the external suggestion array.

◆ code

WCHAR LETTER::code

Character code in 16-bit UNICODE. This is the first choice of the recognition or UNICODE_REJECTED for rejected characters.

◆ err

BYTE LETTER::err

Confidence number expressing both the first guess' recognition certainty (code member) and also the word certainty. For more information see the section confidence reporting.

◆ fontAttrib

WORD LETTER::fontAttrib

Font information about the recognized character. Used by the OCR engines. See its possible bits.

◆ height

WORD LETTER::height

Height of the character rectangle in pixels.

◆ info

DWORD LETTER::info

Additional information about the character. See its possible bits and the macros for easier handling this information.

◆ lang

short LETTER::lang

This is used to declare which language the recognized word belongs to. See Language of a word.

◆ lang2

short LETTER::lang2

See field lang.

◆ left

WORD LETTER::left

Left boundary of the rectangle containing the character in pixels.

◆ makeup

WORD LETTER::makeup

Since the recognition data does not contain extra characters for marking the line ends, paragraphs, pages, etc., these items of information are stored for the particular characters in this field. It can be any binary OR-ed combination of the possible formatting attributes.

◆ ndxBGColor

BYTE LETTER::ndxBGColor

Index of the background color within the palette of the recognition data. See kRecGetLetterPalette.

◆ ndxChoices

DWORD LETTER::ndxChoices

Index of the second choice in the external choice string.

◆ ndxExt

DWORD LETTER::ndxExt

Not documented.

◆ ndxFGColor

BYTE LETTER::ndxFGColor

Index of the foreground color within the palette of the recognition data. See kRecGetLetterPalette.

◆ ndxFontFace

WORD LETTER::ndxFontFace

Index of the logical font definition placed in an external font array.

◆ ndxSuggestions

DWORD LETTER::ndxSuggestions

If this LETTER is not a space, this member is the index of the first suggestion in the external suggestion string. (This member makes a union together with spcInfo.)

◆ pointSize

float LETTER::pointSize

Font size in points. See notes for more info!

◆ reserved_b

BYTE LETTER::reserved_b

Internal use only.

◆ spcInfo

LSPC LETTER::spcInfo

If this LETTER is a space additional information is available here. (This member makes a union together with ndxSuggestions.)

◆ top

WORD LETTER::top

Top boundary of the rectangle containing the character in pixels.

◆ width

WORD LETTER::width

Width of the character rectangle in pixels. See also the section Handling of spaces about 0-width spaces.

◆ widthULdot

BYTE LETTER::widthULdot

Width of a dot in pixels if the "underline" is underdots in reality. 0 if simple underline, 0 if nothing. It also gives this information in the case of dotleaders. (See LSPC.)

◆ widthULgap

BYTE LETTER::widthULgap

Width of a gap in pixels if the "underline" is underdots in reality. 0 if simple underline, 0 if nothing. It also gives this information in the case of dotleaders. (See LSPC.)

◆ zone

WORD LETTER::zone

Index of the zone in the zone list which contains the character.