RecoStar recognition trigram modes

Trigrams are combinations of three letters that are commonly found in many languages. For example, common English trigrams include ing and ion.

Important The RecoStar recognition engine will be deprecated in the next release of Kofax TotalAgility. As a result, Kofax recommends that you use the OmniPage recognition profile for both page and zone recognition for all new projects. If you have existing projects that use one or more RecoStar profiles, it is also recommended that you modify those projects to use a comparable OmniPage profile.

RecoStar can take advantage of trigrams to enhance recognition accuracy. Trigrams can check and optionally repair combinations of letters that have both a low confidence rating and a low frequency of occurrence.

Consider these examples:

Original word


Initial recognition result

After trigram applied


Walking (faint)




Dictionary (with speckle)



In the first case, the image file for Walking suffers from drop-outs. In particular the n is badly faded. The recognition engine cannot decide if it is an r followed by an i, or if it is a single n, so the character is marked as rejected in the initial results. Trigram analysis is applied to the initial result and the recognition engine decides that the most likely combination of three letters, in this case, is ing.

In the second case, the image file contains substantial noise. Because of this noise, the second i in Dictionary is interpreted as the letter l Trigram analysis shows that ion is more likely than lon and the word is corrected. It is important to keep in mind that trigram analysis is a statistical process. RecoStar ships with trigram tables for most supported languages. Each table contains a list of possible three letter combinations and their frequency of occurrence in that language. Although there are thousands of such combinations, many of them are almost never be used so their frequency of occurrence is near zero.

There may be rare occasions where your data contains many uncommon trigrams. For example, a list of Chicago radio stations might include WGN, WLS, WNVR, WKTAF, WZRD, WBEZ, or WXRT. In such cases, if you notice problems, you should consider disabling trigrams for your recognition profile.