RecAPI
Public Attributes
LETTER Struct Reference

The LETTER structure. More...

List of all members.

Public Attributes

WORD  left
WORD  top
WORD  width
WORD  height
float  pointSize
WORD  capHeight
WORD  baseLine
WORD  zone
WCHAR  code
BYTE  err
BYTE  reserved_b
BYTE  cntChoices
BYTE  cntSuggestions
DWORD  ndxChoices
WORD  fontAttrib
WORD  ndxFontFace
DWORD  info
WORD  makeup
BYTE  widthULdot
BYTE  widthULgap
WORD  cellNum
BYTE  ndxFGColor
BYTE  ndxBGColor
short  lang
short  lang2
DWORD  ndxExt
DWORD  ndxSuggestions
LSPC  spcInfo

Detailed Description

The LETTER structure.

This is a recognized data structure. As the result of the recognition process the recognition data will consist of this type of structure for each recognized character. This is the most detailed information available about the recognized characters.

See the usage of alternatives and handling of spaces.

Note:
The field pointSize is not a replacement for the field fontSize in the CSDK versions 12.x (that's the cause of the name change):
  • pointSize is filled only on textual PDF inputs. For others, it is used only internally during page formatting (only at RecApiPlus level).
  • capHeight is always available and could be used for replacing fontSize. Rough approximation: fontSize = capHeight * 100 / dpi (fontSize in CSDK 12.7 is calculated in this way).
The position and size information (left, top, width, height, capHeight, baseLine, widthULdot, widthULgap) is expressed in pixel coordinates mapped to the image specified when getting the letters.
The bounding box (left, top, width, height) of a character usually contains that single character only, but sometimes more characters are recognized together in one step, in which case all those characters have the same bounding box.

Member Data Documentation

Y coordinate of the baseline in pixels. In case of vertical text this is X coordinate. In CCJK vertical text the baseline is in the middle of the characters by definition.

Expresses a measure of the capital letter height in pixels. See notes for more info!

Index of the cell in the cell list which contains the character (applicable only for WT_TABLE zones). Index of the text-line form-element object which contains the character (in case of WT_FORM zones).

Number of related choices continuously placed in the external choice string.

Number of related suggestions continuously placed in the external suggestion array.

WCHAR LETTER::code

Character code in UNICODE. This is the first choice of the recognition or UNICODE_REJECTED for rejected characters.

Confidence number expressing both the first guess' recognition certainty (code member) and also the word certainty. For more information see the section confidence reporting.

Font information about the recognized character. Used by the OCR engines. See its possible bits.

Height of the character rectangle in pixels.

DWORD LETTER::info

Additional information about the character. See its possible bits and the macros for easier handling this information.

short LETTER::lang

This is used to declare which language the recognized word belongs to. See Language of a word.

See field lang.

Left boundary of the rectangle containing the character in pixels.

Since the recognition data does not contain extra characters for marking the line ends, paragraphs, pages, etc., these items of information are stored for the particular characters in this field. It can be any binary OR-ed combination of the possible formatting attributes.

Index of the background color within the palette of the recognition data. See kRecGetLetterPalette.

Index of the second choice in the external choice string.

Not documented.

Index of the foreground color within the palette of the recognition data. See kRecGetLetterPalette.

Index of the logical font definition placed in an external font array.

If this LETTER is not a space, this member is the index of the first suggestion in the external suggestion string. (This member makes a union together with spcInfo.)

Font size in points. See notes for more info!

Internal use only.

If this LETTER is a space additional information is available here. (This member makes a union together with ndxSuggestions.)

Top boundary of the rectangle containing the character in pixels.

Width of the character rectangle in pixels.

Width of a dot in pixels if the "underline" is underdots in reality. 0 if simple underline, 0 if nothing. It also gives this information in the case of dotleaders. (See LSPC.)

Width of a gap in pixels if the "underline" is underdots in reality. 0 if simple underline, 0 if nothing. It also gives this information in the case of dotleaders. (See LSPC.)

Index of the zone in the zone list which contains the character.