Image cleanup methods
You can add one or more image cleanup methods to an image cleanup profile. In many cases, a combination of methods works best for optimal results.
For example, if you have a document with a lot of background noise such as a watermarked pattern, as well as a combination of light hand written text and heavy machine printed text, you could create an image cleanup profile that performs despeckling, fills line breaks, and applies a light thickening filter. The combination of these three methods clears the background, smooths the text, and thickens the hand written content so it is more easily recognized.
The following methods are available for image cleanup profiles:
|
Method |
Description |
|---|---|
|
Advanced Despeckle |
Removes unwanted blobs from an image. A blob is a group of connected pixels that are part of an image background or stray marks, that are not removed by scanning. You can layer more than one Advanced Despeckle on top of another to remove items of different sizes from an image. You can also use this cleanup method to recognize the presence of hand written mark-up, remove boxes, repeating patterns, images, and other graphics from documents. Once these items are removed from the document, recognition rates can be significantly improved. When configuring this image cleanup method, test a range of documents to ensure that you are not removing important text or data needed for extraction. |
|
Character Smoothing |
Smooths the edges of characters to they are easier to recognize. It is common for character sharpness to be lost when a document is faxed or printed on a printer that is low in ink. As a result, this cleanup method looks at the characters and intelligently fills in blank areas, removes jagged corners, and removes some background debris before recognition is performed. This can improve recognition and extraction results. Use this method for documents that contain machine printed text that returns poor recognition results. This is commonly used with the Smooth + Clean or the Smooth + Clean + Preserve cleanup methods. |
|
Despeckle |
Removes single dots, also known as noise from around characters, so they are easier to recognize. Use this method for documents that contain watermark information or other noise that interferes with recognition results. |
|
Fill Line Breaks |
Locates and repairs gaps in characters so they are more recognizable. This method is commonly used when the scan quality of a document is poor. The recognition engine finds it difficult to recognize characters with holes or gaps, so this method may improve results. This engine is sometimes used in partnership with the Smooth + Clean or Smooth + Clean + Preserve methods. |
|
Fill Line Breaks and Preserve |
Locates and repairs gaps in characters so they are more recognizable. This method also preserves both horizontal and vertical lines. This method is commonly used when the scan quality of a document is poor. |
|
Light Thicken Filter |
Locates and repairs gaps in characters but it also expands objects to they are easier to recognize. This method is commonly used when a document contains hand printed or cursive hand writing. |
|
Remove Lines |
Removes horizontal and vertical lines to the content adjacent to them is easier to recognize. This method removes solid lines, dotted, or dashed lines. |
|
Smooth and Clean |
Removes smaller structures and smooths edges so content is easier to recognize. Use this method to smooth any jagged edges of machine printed characters. Adjacent lines may be removed when this method is applied. If you want to preserve the lines, use the Smooth + Clean + Preserve method. |
|
Smooth, Clean and Preserve |
Removes smaller structures and smooths edges. It also preserves horizontal and vertical lines so content is easier to recognize. Use this method to smooth any jagged edges of machine printed characters without affecting any adjacent lines. |
|
Thicken Filter |
Fills gaps and expands objects so content is easier to recognize. This method makes characters thicker than the Light Thicken Filter, and is commonly used for thin hand written content. |
|
Thinning Filter |
Removes object edges and fine structures so content is easier to recognize. Use this method if a document contains very thick or bold text that is difficult to recognize. |