RecAPI
All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Properties Modules Pages
Processing handwritten text

Separating handwritten and machine printed texts improves the OCR accuracy. Distinguishing between handwritten texts in the machine-printed image is important because the machine printed OCR gives a more accurate result. The height and width of handwritten characters are very varied, so this can confuse the zone detection prepared for machine texts. If the handwritten texts are separated, and zoned or copied into the new HPAGE in the image, you can call an OCR that recognizes the appropriate handwriting, if necessary. You can use it to remove handwritten scribbles from invoices before processing, or you can check if there is a signature or handwritten information on the document. Handwriting detection applies to both signatures and handwritten numbers and letters in the image. The program does not distinguish the signature from other handwritten texts.

Neural networks can learn characteristics from training data set analysis and then classify an unknown image based on weights. The neural network used in CSDK consists of convolutional layers that can extract a lot of information from the image using kernels. During the learning process, the network uses neurons to select important features that distinguish handwritten characters from typed ones. Using neural networks instead of classical methods means that there is no need for hand coding. Seeing many learning patterns, the network is more flexible to changes in handwriting style and transcends the limitations of classical coding.

You can use two functions to detect handwritten areas:

  • kRecLocateHandwriting: change the current image in HPAGE. There are two modes for handling the original image:
    • LH_REMOVE_HANDWRITING: If you do not need handwritten texts or signatures, use this setting to delete the handwritten characters from the original image. It increases the accuracy of the machine printed OCR. This mode does not return zones.
    • LH_RETAIN_HANDWRITING: If you are only interested in handwritten text, for example you want to check if there is a signature in the image, use this mode.
  • kRecCreateHandwritingPage: return the handwritten image in a separate HPAGE. There are two modes for handling image parts containing different text:
    • CHP_FIND_HANDWRITING: If you need the original document and it is only the handwritten zones that you want to get separately, use this mode. The original HPAGE will be unchanged.
    • CHP_SEPARATE_HANDWRITING: If you need to separate both handwritten areas and machine printed text from each other, use this mode. The handwritten text will be deleted from the original HPAGE and a new HPAGE will be created containing only the handwritten text and zones. The two pages can be processed differently according to their content.

IMPORTANT NOTES:

  • CSDK can select the zone recognition mode automatically to the hand-written zone filling method (FM_HANDPRINT). These zones can be recognized by RM_RER or RM_HNR or another handwritten text recognizer.
  • If the program did not find a handwritten area in the image, the number of zones is zero. Check it with function kRecGetZoneCount.