Regular Regex Extraction plugin

This plugin extracts index field values based on the pattern defined for that field. A semicolon-separated collection of one or more words followed by a regular expression can be defined for the pattern. The system searches each page for the regular expression. If a match is found, the system looks to the left of the match and sees if all of the preceding words in the pattern can be found. If all of the words are found (in order), the value is extracted. If only a subset of the words are found, or if none of the words are found, the value is not extracted.

Examples

Consider the following text defined for the pattern field of the InvoiceDate index field: Invoice;Date;d{1,2}[/]d{1,2}[/]d{2,4}

Example 1

Text string in document: Invoice Date 21/03/2012

Result: "21/03/2012" is extracted for the InvoiceDate index field. This happens because "21/03/2012" matches the regular expression pattern, with "Date" found to its left, and "Invoice" found to its left.

Example 2

Text string in document: Date 21/03/2012

Result: Nothing is extracted for this index field. Even though "21/03/2012" matches the regular expression, and "Date" is found to its left, the word "Invoice" is not found to the left of "Date".

Plugin configuration

Configurable property	Type of value	Value options	Description
Regular Regex Extraction Switch	List of Values	ON OFF	This property determines if the plugin will run or not. Default value is ON.
Regular Regex Confidence Score	Integer	0 - 100	Acts as a multiplier for the confidence score calculated by matching regex.

In the Pattern column you can enter the semicolon-separated set of words and regular expression for each index field.

Troubleshooting

Error message	Possible root cause
Invalid input pattern sequence.	The pattern entered is not a valid regular expression, or does not match the proper format.
No FieldType data found from data base for document type	The FieldType column does not contain a valid value.

Regular Regex Extraction plugin

Examples

Plugin configuration

Troubleshooting

Search results

Search tips