Configure the Adaptive Feature Classifier

Use the procedure in this topic to configure the top-level content classifier in your project.

Each project has one top-level content classifier only. This content classifier is an Adaptive Feature Classifier (AFC), and once it is enabled, you can configure this top-level AFC to meet your specific requirements.

In most cases, the default settings for the AFC are sufficient. For the best results, test your classification settings before configuring this classifier and then make the necessary changes and compare them to the previous results.

Even though subtree classifiers are also Adaptive Feature Classifiers, any changes you make to this top-level classifier does not affect the subtree classifiers.

Before you begin

Before you can configure the Adaptive Feature Classifier for content classification, first Initialize the content classifier.

Procedure

  1. On the Project tab, in the Configuration group, select Project Settings Project Settings icon.
  2. Select the Classification tab to view the classification settings.
  3. In the Content Classification group, Properties Properties icon for the Adaptive Feature Classifier.

    The Adaptive Feature Classifier Properties window is displayed.

  4. Configure the following settings:

    Use digits

    This setting controls whether the classifier uses digits as features or ignores them during text filtering. (Default: Cleared)

    Max. number of features

    Limits the maximum number of internally generated features per class. (Default: 5000)

    Min. feature length

    Specifies the minimum number of characters that should be used for a feature. This value cannot be smaller than the Min. word length. (Default: 3)

    Max. feature length

    Specifies the maximum number of characters that are used for a feature. Should not be larger than 64 characters. (Default: 50)

    Automatic selection of Min. feature frequency

    Enables the Min. feature frequency to be set automatically. If this setting is selected, you cannot manually assign a Min. feature frequency value. (Default: Cleared)

    Min. feature frequency

    Specifies how often a substring is displayed inside the training set of a class to be used as a feature for content classification. (Default: 2)

    Start features at begin of words

    Specifies that a feature substring needs to start at the beginning of a word. If not checked, the substring can start anywhere. (Default: Selected)

    Max. words per feature (0-n)

    Limits the number of words per feature. A value of zero means unlimited words, although the total number of characters of the words per feature cannot exceed the "Max. feature length" property. (Default: 2)

    Use fuzzy string match

    Enables matching fuzziness with the disadvantage of slower classification performance. (Default: Cleared)

    Fuzzy length (5-10)

    Configures the fuzzy string comparison. (Default: 5)

    Automatic selection of Min. class entropy

    Enables the Min. class entropy to be set automatically. If this setting is selected, you cannot manually assigned a Min. class entropy value. (Default: Cleared)

    Min. class entropy (0.0 - 1.0)

    Controls the importance of a feature, depending on the number of classes where it is displayed. A value of 1.0 requires that a feature is displayed only inside the sample documents of a single class; otherwise, it is not used for classification. The lower the value, the more classes can contain the feature inside the training set. (Default: 0.600)

  5. Select OK.

    The Adaptive Feature Classifier Properties window is closed.

  6. Optional. Select OK to close the Project Settings window.