TDS Page Classifier Properties Window

You can configure the TDS Page Classifier properties with this window.

Important You need to retrain the project before any changes made to these settings can take affect.
Text Filtering

This group has the following options:

Use digits

This option controls whether the classifier uses digits as features or ignores them during text filtering. This option is cleared by default.

Min. word length

All words that are shorter than this value are ignored during text filtering. Independently of word length, features with a very low or high frequency are also not taken into account. The value for this option is set to 3 by default.

Training

This group has the following options:

Max. number of features

Limits the maximum number of internally generated features per class. The value for this option is set to 5000 by default.

Min. feature length

Specifies the minimum number of characters that should be used for a feature. This value cannot be smaller than the Min. word length. The value for this option is set to 3 by default.

Max. feature length

Specifies the maximum number of characters that are used for a feature. Should not be larger than 64 characters. The value for this option is set to 50 by default.

Automatic selection of Min. feature frequency

Enables the Min. feature frequency to be set automatically. If this option is selected, you cannot manually assign a Min. feature frequency value. This option is cleared by default.

Min. feature frequency

Specifies how often a substring is displayed inside the training set of a class to be used as a feature for content classification. The value for this option is set to 2 by default.

Start features at beginning of words

Specifies that a feature substring needs to start at the beginning of a word. If not checked, the substring can start anywhere. This option is selected by default.

Max. words per feature (0-n)

Limits the number of words per feature. A value of zero means unlimited words, although the total number of characters of the words per feature cannot exceed the "Max. feature length" property. The value for this option is set to 2 by default.

Use fuzzy string match

Enables matching fuzziness with the disadvantage of slower classification performance. This option is cleared by default.

Fuzzy length (5-10)

Configures the fuzzy string comparison. The value for this option is set to 5 by default.

Automatic selection of Min. class entropy

Enables the Min. class entropy to be set automatically. If this option is selected, you cannot manually assigned a Min. class entropy value. This option is cleared by default.

Min. class entropy (0.0-1.0)

Controls the importance of a feature, depending on the number of classes where it is displayed. A value of 1.0 requires that a feature is displayed only inside the sample documents of a single class; otherwise, it is not used for classification. The lower the value, the more classes can contain the feature inside the training set. The value for this option is set to 0.600 by default.

Definitions for the buttons at the bottom of this window can be found in Common Transformation Designer Buttons.