Variable, Score, Filter

Input:

Receipt.fp, total.yml

Command:

FormProcApp.exe Receipt.fp Example5 Example5\total.yml -l en-US

Output:

Receipt_total_0.json

This continues from Regular expressions by introducing variables. It either stores the value of a regex capture group (in the case of the rule re), or an entire match of a vocabulary rule (voc).

- id: amount-anchor
  rules:
  - [{re: '(SUB)?TOTAL:*', var: [$sub, 1]}, {expr: [$sub, eq, ''], score: 50}]
  - {re: 'PAYMENT:*'}

Here we have promoted matching "Total" over "Subtotal" by score 50:

guide> jq -c '{(.input): [.fields[] | {(.id): [.matches[] | [.text, .score, .error]]}]}' Example5\Receipt_total_0.json
{"Receipt.fp":[{"amount-anchor":[["Total",50,0],["Subtotal",0,0]]},{"amount":[["55,11",0,0],["55.91",0,0]]}]}

If you remove the ", score: 50" part, it filters out the Subtotal anchor and only Total would be existing in the output with the belonging amount "55,11". According to the logic when score is 0, the complementary rule expr works as an extra 'and' operator.

Spaces can be problematic for OCR when dealing with narrow image formats or chaotic tabulations. For larger test sets, users may want to add an extra filters key to eliminate spaces. While regex re: still functions (if not using any word boundary), voc does not:

- id: amount-anchor
  filters:
  - [[' '], ['']]
  rules:
  - [{re: '(SUB)?TOTAL:*', var: [$sub, 1]}, {expr: [$sub, eq, ''], score: 50}]
  - {re: 'PAYMENT:*'}

The result in this example remains the same. However, you may want to modify Example5\Total.yml to experiment with a broader test set.