RecAPI
Processing legal documents

CSDK introduces algorithms to better handle pleading documents. These legal documents can be processed by any normal workflow, with any generally supported image input and can modify the output content for all targeted conversions. The legal algorithms are activated by setting Kernel.LegalPleading to TRUE.

Legal recognition addresses the following issues:

Pleading numbers

These appear vertically on the page, on either one or both sides of the body text, to uniquely identify text lines. Sometimes every line is numbered, sometimes less, for instance each fifth line is numbered. The numbers may be separated from the body text by a vertical line. Legal recognition tries to identify these elements and store them separately so they do not interfere with the body text. This allows them to be dropped or separately retained when the documents are exported to target files using the setting Converters.Text.xxxx.RetainLineNumberingZones where xxxx is the output file type name. A special case is saving to searchable PDF with pleading numbers retained. The numbers are retained in the page images, but not in searchable text, so that searching for numbers will only find occurrences of numbers in the body text.

Recognition tries to retain the original page content as far as possible, but text wrapping may disturb the precise alignment of line number and content. If precise synchronization is important, you can design your application to prevent text wrapping by inserting line breaks at the end of each line as detected in the original pages.

Document identifier

This is usually a header table appearing at the top of the first page of a legal document, with a single row and three columns, naming the plaintiff, defendant and case identifiers. The formatting of table borders can vary widely – legal recognition attempts to render these tables with standard formatting.

Page header removal

This is a single-line case identifier that typically appears as a header on each page, and is usually auto-generated by a document management system. When legal recognition is set, such header lines are detected and removed during preprocessing, so they are neither recognized nor exported. With general processing, these header lines remain on the pages.