Form processor

The following section contains information on the Form processor feature. The provided examples rely on the FormProcAPI, running FormProcApp.exe in the command line.

In general, the API function OPFP_Process (or simply running FormProcApp.exe) iterates through all the recognized text fragments for each of the fields defined in the OPFD .yaml file. In case of a match, it is written to the output. The output is then ordered in the way described in the output format document.

A text fragment is a piece of text in one line that appears to belong together. For a text fragment unit one match is produced to the output by default. The regex rule has a multi feature that can override this, that is to parse more numbers in a line if that would be the real need.

In larger documents, like bank statements or other official documents with multiple columns, more text fragments can exist in a line.

The FormProc engine with the OPFD descriptor .yaml file as a flexible interface, is optimized for searching key-value pairs. This can be achieved the following ways:

  • Anchor as a separate field.

    Similar to the referring field, but there is a positional rule defined in the referring field referring to the so-called anchor field. For more informattion, see Anchor.

  • Regular expression capture groups.

    Through compound output, for more information, see Compound output.

Most often, the key is a known word of a set of words (vocabulary), though there is an option for a regular expression to be set as key. The value can be either an element of a vocabulary or an unknown string expected to follow certain pattern, that is appropriate for a regular expression.

The rules describe how to capture matches belonging to a given field that is either an anchor or a referring field.

It is assumed that the user is working with multiple files while editing a .yaml file. Multiple .yaml descriptors can exist together with a large number of input image files that are already converted to .fp (see Sample86) so that it is quickly processed and is reproducible. These examples are demonstrated here on one input file.

In the following examples we refer to an existing tool 'jq', that is publicly accessible at https://github.com/jqlang/jq. The user is able to jump in without having to write any code in a high level language initially. This jq tool is also able to work on many .json files in a directory by using *.json instead of a given .json file name. It is recommended to switch to multiple input/output files once the user is familiar with working on a single file. In the following examples, it is assumed that jq is accessible in path or this command was given:

guide>  Set-Alias -Name jq C:\install\run\jq\jq-windows-amd64.exe

Substitute your local installation path with this path.

It is recommended to apply a user-specific filtering logic after parsing the .json output by FormProc, such as using a third party address parser or validator, or add any special user logic.

The .json output can also contain a list further possible matches. This list is ordered by decreasing accuracy. The user has to deal with those multiple choices further, according to the higher level app's awareness.

FormProc does not intend to replicate any logic that a high level language like C# can already provide, it is assumed that the application is refining the .json output further as the use cases require.