Module name:	OMR
Module identifier:	RM_OMR
Filling methods supported	FM_OMR
Filters supported:	ignores all filter settings
Trade-off supported:	none
Knowledge base files:	none
Training file supported:	no

This module requires the Recognition Add-on. See the topic on Licensing in the General Information help system.

Application areas

This recognition module is used for recognizing optical marks (checkmarks). Typical application areas are in questionnaires, ballot papers, educational tests and reporting or ordering sheets, where the documents to be processed are form-like and filled by respondents, usually by hand.

IMPORTANT NOTE: Autozoning can not find OMR zones, therefore manually (programmatically) created user zones (see FM_OMR and RM_OMR) or pre-defined form templates (see how to use form templates) can be used.

Accuracy issues

Checkmark zones must contain the whole checkbox including the bounding printed frame.

Usually the zones are specified to be a bit larger than the checkbox (frame); it is not a problem if a zone contains some of the text next to the checkbox, but make sure that the covered text is not larger than the checkbox itself.

In some cases the frame is not visible on the scanned image, e.g. when a dropout color is used during scanning. Such checkboxes are much harder to recognize correctly, so do not use dropout color for OMR. If you must process such checkboxes (e.g. when you do not have control on the scanning process) there are some settings to make recognition as accurate as possible, see kRecSetOmrParams:

specify in advance how the module will handle frame detection (are the frames visible or invisible)
specify the "marking sensitivity" of the module, i.e. how strongly an OMR zone must be marked to count as filled.

While it is strongly discouraged to specify zones that cut into the frames, sometimes it is hard to avoid. In such cases setting Kernel.Ocr.OMR.ZoneCorrection to true may help: The correction algorithm checks a bit larger area around the border of the zone for processing. Note however that even this area does not necessarily include the whole frame (because correction must be careful not to misdetect nearby content), so the best way to get the accurate results is to specify the most accurate zones possible.

This recognition module is not influenced by the recognition trade-off setting. By default the visibility of a frame is detected automatically, but specifying if they are visible or not may make accuracy higher (see kRecSetOmrParams).

See also the topic Instructions to respondents below.

Conditions

The frame can be a rectangle, a circle, an ellipse, etc.; it can be shaded. It may be visible or invisible in the image sent for recognition. The dimension of the frame should be at least 45-50 pixels in each direction, that is 3.5 to 4 mm (0.2 inch) in the case of 300 dpi resolution.

If there is a frame visible in the image and the filled-in-error feature is disabled:
it can be filled in by any shape such as an X, a tick, non-solid hatching, horizontal or vertical lines, etc.). The recommended filling shape is an X or a tick. A small number of contiguous black pixels falling within the checkmark area will lead to a value "filled". The recommended scanner brightness setting is slightly darker than 50%.
If there is no frame visible in the image or the filled-in-error feature is enabled:
it should be filled so that the checkmark shape could not be mixed with a frame or half-frame, i.e. no lines parallel to the invisible zone borders. The filling shape should be a clear X or a tick. The dimensions of the OMR mark should be at least 45-50 pixels in both directions, which is 3.5-4 mm (0.2 inch) in the case of 300 dpi resolution.

This module has been tested on an image with more than 1300 OMR zones.

Output

An OMR (optical mark) zone is unique in that its output always consists of precisely one digit. It can be defined to be one of two or one of three values. When there are two possible values, these are zero (0) for unfilled, one (1) for filled. When three values are possible, the additional value is two (2) for "filled-in-error" (see below).

The safest way to link the output values with the checkboxes, which generated them, is through the LETTER structure output, which contains the zone number and the coordinates (zone, left, top, width, height). This can also help prevent checkmark data being confused with barcode values or other non-checkmark data coming from the same page.

If a page contains mainly or only checkmark data, the output converters "Text - Tab Delimited", "Text - Comma Delimited" or "Excel 97, 2000" can be used to load the data into a spreadsheet program for further analysis and presentation.

Filled-in-error

The filled-in-error feature allows the application to handle checkboxes that were filled by mistake. This feature is available only with the KernelAPI.

The respondent in this case should completely blacken the frame or checkmark area before marking a new choice. It is not essential that the area be completely blackened, but it must be significantly darker and denser than a "filled" (checked) zone. A "filled-in-error" zone generates an output value 2. This feature functions only if two conditions are met:

kRecSetOmrParams must be called with the value pFill set to TRUE.
Sets of OMR zones relating to one question must be grouped.

The filled-in-error feature functions only on grouped zones. Recognition results for each zone in a group will be 0, 1 or 2. There should be only one filled zone per group plus optionally one filled-in-error. When designing a checkmark document, all zones in a group should have the same checkbox style and size. Up to 32 OMR zones can be grouped.

Grouping OMR zones

In CSDK versions earlier than v15 the OMR zones could be grouped by modifying the seq field in the zone structure. From v15 this is not necessary, due to the notion of pizzabox zones. OMR zones can be groupped by collecting them in one pizzabox zone even they are not touching (i.e. one criterion of pizzabox shape is not fulfilled).

Instructions to respondents

OMR processing requires a high degree of accuracy. The two-value detection is inherently accurate; three-value detection is more difficult. Good document design and clear instructions to respondents are very important in getting high accuracy. Printing model samples of ideally filled and filled-in-error checkboxes in the instructions is recommended. Respondents should be urged to fill in the document with a dark blue or black pen. Pencils are to be avoided, as are pens with an ink color close to a dropout color on the scanner to be used.)

Note: See OMR Recognition Engine Module.