Configure the Adaptive Feature Classifier
Use the procedure in this topic to configure the top-level content classifier in your project.
Each project has one top-level content classifier only. This content classifier is an Adaptive Feature Classifier (AFC), and once it is enabled, you can configure this top-level AFC to meet your specific requirements.
In most cases, the default settings for the AFC are sufficient. For the best results, test your classification settings before configuring this classifier and then make the necessary changes and compare them to the previous results.
Even though subtree classifiers are also Adaptive Feature Classifiers, any changes you make to this top-level classifier does not affect the subtree classifiers.
Before you begin
Before you can configure the Adaptive Feature Classifier for content classification, first Initialize the content classifier.
Procedure
-
On the
Project tab, in the
Configuration group, select
Project Settings
.
- Select the Classification tab to view the classification settings.
-
In the
Content Classification group,
Properties
for the
Adaptive Feature Classifier.
The Adaptive Feature Classifier Properties window is displayed.
-
Configure the following settings:
Use digits
This setting controls whether the classifier uses digits as features or ignores them during text filtering. (Default: Cleared)
Max. number of features
Limits the maximum number of internally generated features per class. (Default: 5000)
Min. feature length
Specifies the minimum number of characters that should be used for a feature. This value cannot be smaller than the Min. word length. (Default: 3)
Max. feature length
Specifies the maximum number of characters that are used for a feature. Should not be larger than 64 characters. (Default: 50)
Automatic selection of Min. feature frequency
Enables the Min. feature frequency to be set automatically. If this setting is selected, you cannot manually assign a Min. feature frequency value. (Default: Cleared)
Min. feature frequency
Specifies how often a substring is displayed inside the training set of a class to be used as a feature for content classification. (Default: 2)
Start features at begin of words
Specifies that a feature substring needs to start at the beginning of a word. If not checked, the substring can start anywhere. (Default: Selected)
Max. words per feature (0-n)
Limits the number of words per feature. A value of zero means unlimited words, although the total number of characters of the words per feature cannot exceed the "Max. feature length" property. (Default: 2)
Use fuzzy string match
Enables matching fuzziness with the disadvantage of slower classification performance. (Default: Cleared)
Fuzzy length (5-10)
Configures the fuzzy string comparison. (Default: 5)
Automatic selection of Min. class entropy
Enables the Min. class entropy to be set automatically. If this setting is selected, you cannot manually assigned a Min. class entropy value. (Default: Cleared)
Min. class entropy (0.0 - 1.0)
Controls the importance of a feature, depending on the number of classes where it is displayed. A value of 1.0 requires that a feature is displayed only inside the sample documents of a single class; otherwise, it is not used for classification. The lower the value, the more classes can contain the feature inside the training set. (Default: 0.600)
-
Select
OK.
The Adaptive Feature Classifier Properties window is closed.
- Optional. Select OK to close the Project Settings window.