Anchor

Input:

Receipt.fp, total.yml

Command:

FormProcApp.exe Receipt.fp Example1 Example\total.yml -l en-US

Output:

Receipt_total_0.json

This example intends to extract amount numbers prefixed by any of the "subtotal", "total" or "payment" keywords.

By invoking FormProcApp.exe with the above parameters, it is essentially calling this API function:

	int FORMPROCAPI OPFP_Process(const char* input_ocr, const char** opfd_list, const char* output_dir, const char* locale);

The input_ocr is a path to a single file of .fp format, which can contain OCR info of multiple pages. However, you can simply replace such an .fp file path by a directory so that FormProcApp.exe recognizes it as a directory path and sequentially calls OPFP_Process for each .fp file found in that directory.

The first basic example:

- id: amount-anchor
  rules:
  - {voc: ['SUBTOTAL', 'TOTAL', 'PAYMENT']} 

This field represents an anchor to be used by the amount. Normally, capital letters should be used in case the user just wants a simple case-insensitive match. The exact flag is set to false by default. This means it first compares the text unchanged, and if no match is found, it tries again with the text fragment converted to uppercase. Consequently, the second attempt does not match at all if you specified lowercase letters in the .yaml file.

- id: amount
  rules:
  - [re-decimal, {right-from: amount-anchor} ]

Here we have specified which fields to use as an anchor, and the relationship (right-from).

Since parsing decimal values is quite common, we have made it possible to reference it by name:

Named rules:

- id: re-decimal
  rule: [ {re: '\d(\d|\.|,|-)*'} ]

To quickly analyze one or many outputs rather than opening the .json in a text editor:

guide> jq -c '{(.input): [.fields[] | {(.id): [.matches[] | [.text, .score, .error]]}]}' Example1\Receipt_total_0.json
{"Receipt.fp":[{"amount-anchor":[["Subtotal",0,0],["Total",0,0]]},{"amount":[["55.91",0,0],["55,11",0,0]]}]} 

The "total" keyword has been found twice: one having an amount "55.91" next to it ("Subtotal"), and another with "55,11" (anchor: "Total"). The relation is elaborated in the output file Receipt_total1_0.json under referredFields.