RecAPI
|
Module name: | OMR |
Module identifier: | RM_OMR |
Filling methods supported | FM_OMR |
Filters supported: | ignores all filter settings |
Trade-off supported: | none |
Knowledge base files: | none |
Training file supported: | no |
This module requires the Recognition Add-on. See the topic on Licensing in the General Information help system.
This recognition module is used for recognizing optical marks (checkmarks). Typical application areas are in questionnaires, ballot papers, educational tests and reporting or ordering sheets, where the documents to be processed are form-like and filled by respondents, usually by hand.
IMPORTANT NOTE: Autozoning can not find OMR zones, therefore manually (programmatically) created user zones (see FM_OMR and RM_OMR) or pre-defined form templates (see how to use form templates) can be used.
Checkmark zones must contain the whole checkbox including the bounding printed frame.
Usually the zones are specified to be a bit larger than the checkbox (frame); it is not a problem if a zone contains some of the text next to the checkbox, but make sure that the covered text is not larger than the checkbox itself.
In some cases the frame is not visible on the scanned image, e.g. when a dropout color is used during scanning. Such checkboxes are much harder to recognize correctly, so do not use dropout color for OMR. If you must process such checkboxes (e.g. when you do not have control on the scanning process) there are some settings to make recognition as accurate as possible, see kRecSetOmrParams:
While it is strongly discouraged to specify zones that cut into the frames, sometimes it is hard to avoid. In such cases setting Kernel.Ocr.OMR.ZoneCorrection to true may help: The correction algorithm checks a bit larger area around the border of the zone for processing. Note however that even this area does not necessarily include the whole frame (because correction must be careful not to misdetect nearby content), so the best way to get the accurate results is to specify the most accurate zones possible.
This recognition module is not influenced by the recognition trade-off setting. By default the visibility of a frame is detected automatically, but specifying if they are visible or not may make accuracy higher (see kRecSetOmrParams).
See also the topic Instructions to respondents below.
The frame can be a rectangle, a circle, an ellipse, etc.; it can be shaded. It may be visible or invisible in the image sent for recognition. The dimension of the frame should be at least 45-50 pixels in each direction, that is 3.5 to 4 mm (0.2 inch) in the case of 300 dpi resolution.
This module has been tested on an image with more than 1300 OMR zones.
An OMR (optical mark) zone is unique in that its output always consists of precisely one digit. It can be defined to be one of two or one of three values. When there are two possible values, these are zero (0) for unfilled, one (1) for filled. When three values are possible, the additional value is two (2) for "filled-in-error" (see below).
The safest way to link the output values with the checkboxes, which generated them, is through the LETTER structure output, which contains the zone number and the coordinates (zone, left, top, width, height). This can also help prevent checkmark data being confused with barcode values or other non-checkmark data coming from the same page.
If a page contains mainly or only checkmark data, the output converters "Text - Tab Delimited", "Text - Comma Delimited" or "Excel 97, 2000" can be used to load the data into a spreadsheet program for further analysis and presentation.
The filled-in-error feature allows the application to handle checkboxes that were filled by mistake. This feature is available only with the KernelAPI.
The respondent in this case should completely blacken the frame or checkmark area before marking a new choice. It is not essential that the area be completely blackened, but it must be significantly darker and denser than a "filled" (checked) zone. A "filled-in-error" zone generates an output value 2. This feature functions only if two conditions are met:
The filled-in-error feature functions only on grouped zones. Recognition results for each zone in a group will be 0, 1 or 2. There should be only one filled zone per group plus optionally one filled-in-error. When designing a checkmark document, all zones in a group should have the same checkbox style and size. Up to 32 OMR zones can be grouped.
In CSDK versions earlier than v15 the OMR zones could be grouped by modifying the seq field in the zone structure. From v15 this is not necessary, due to the notion of pizzabox zones. OMR zones can be groupped by collecting them in one pizzabox zone even they are not touching (i.e. one criterion of pizzabox shape is not fulfilled).
OMR processing requires a high degree of accuracy. The two-value detection is inherently accurate; three-value detection is more difficult. Good document design and clear instructions to respondents are very important in getting high accuracy. Printing model samples of ideally filled and filled-in-error checkboxes in the instructions is recommended. Respondents should be urged to fill in the document with a dark blue or black pen. Pencils are to be avoided, as are pens with an ink color close to a dropout color on the scanner to be used.)