Kofax Clarity Page Recognition Profile Settings window
This recognition engine differs slightly from the other engines. It is specialized to recognize text in any image, including an picture of an ID badge worn by a person, or a photo of a sign. Also, instead of performing OCR on the Kofax Transformation Toolkit server, OCR is performed on a remote server via an internet connection.
Communication between the Kofax Clarity recognition profile and its server use port 443. This port must be open in your Firewall settings.
If you perform recognition using Kofax Clarity during testing in Project Builder, runtime licenses are consumed. If you want to avoid consuming too many runtime licenses during project configuration and testing, Kofax recommends that you test Kofax Clarity on selected images only, and that you perform OCR on other test images or training images using another recognition engine.
During configuration, the best practice is to run this recognition engine a few times without a fallback recognition engine configured. This ensures that everything is working as expected and that the proper internet access if available.
You can use this window to set the Kofax Clarity full-text OCR profile.
- Languages
-
This group enables you to select one or more specific languages, or allow the recognition engine to determine the language itself.
This group has the following options:
- Automatic language detection
-
Select this to allow the Kofax Clarity recognition engine to determine the language of a document. This option is selected by default.
- Select languages
-
Select this option if you want to explicitly specify what languages are used in your documents.
Once selected, the list of languages is enabled. Select one or more languages.
The list of available languages depends on which Recognition mode is selected.
If your documents are in Chinese, Greek, Hebrew, Japanese, Korean, or Thai, these are not supported by Document mode. These languages are supported by Text mode, but your documents may not be suitable for that mode. For the best results for documents with these languages, thoroughly test both modes with the selected language as well as the Automatic language detection option to see which combination performs best. Alternatively, select a different recognition engine.
Important If you are not sure what languages are used in your documents, use the "Automatic language detection" option because it provides better OCR results than if the wrong languages are selected.
- General Settings
-
This group enables you to specify how a document is recognized with the Kofax Clarity engine.
This group has the following options:
- Recognition mode
-
Select one of the following modes of recognition.
-
Document mode.
Select this mode if your documents are classic paper documents, forms, or densely packed text images. For example, an invoice or a bank letter. This option is selected by default.
-
Text mode.
Select to detect and extract text from images with a small amount of text. For example, a photo ID card.
-
- Word separation characters
-
Use this field to define what characters may separate words. The value for this option is set to /:()-# (forward slash, colon, open and close parentheses, hyphen, pound) by default.
- Fallback Profile
-
If the Kofax Clarity recognition profile is temporary unavailable, you can configure what recognition engine is used instead. This ensures that a broken network connection does not hold up processing with failed OCR results.
- Recognition profile as fallback
-
Select a page recognition profile that performs OCR if the Kofax Clarity recognition profile is not available. The value for this option is set to <None> by default.