Verzakov Tree - Trainable Decision Tree Classifier
W = DTC(A, CRIT, CHISQSTOPVAL, PRUNE, T, NFEAT)
Object weights and class priors:
| A|| Training dataset.|
It is possible to assign individual weights to dataset
objects by using 'weights' identifier:
A = SETIDENT(A, WEIGHTS, 'weights')
If weights are not defined they assumed to be equal to 1 for
all objects. The actual weights used by the training routine
are computed by using supplied weights along with the class
priors (if priors are not set then the apparent priors are used):
ACTUAL_WEIGHTS(I) = M*(WEIGHTS(I)/CW(C))*(PRIOR(C)/SUM(PRIOR))
where M is the total amount of (labelled) objects in A,
CW is the vector weights sums for each class, and C is
the class of the object I. The sum of all actual weights is M.
Features are treated differently based on their domain information.
If the feature domain is empty or is a collection of intervals
then this feature is considered to be the continuous one and
branches are created by splitting at the threshold value.
Otherwise (feature domain specifies a set of values or a set
of names), the feature is considered to be the nominal one
and branches corresponding to all present feature values are
Unknown and 'non-applicable' values:
If the feature value is NaN then it is assumed that its value
is unknown. In such a situation the object with unknown value
is split into fractions and sent down all branches.
The concept of the 'non-applicable' feature value is different
from the concept of the unknown (missing) value.
The 'non-applicable' values for the continues features are
encoded as INF. If feature domain is the (set of) interval(s),
then INF value has to be explicitly added to the domain
defintion. The 'non-applicable' value of nominal features
does not have predefined encoding. If it is necessary user
have to include such value into domain definition.
CRIT Splitting citerion name.
'igr' Information Gain Ratio (default):
As defined by Quinlan. The penalty on the number of the distinct
values of the continues feature is used. If the gain is zero or
negative due to such penalisation, the split is not performed.
This leads to smaller trees and may give non-zero training error.
This criterion does not use costs. (Costs are used only at the
'gini' Gini impurity index. More precisely, the change in this
index. GINI index can be interpreted as a misclassification rate
for the stochastic prior based classifier, so costs are
naturally embedded. If the change in the (absolute) error less
or equal to 0.1 (change in the cost less or equal to 0.1 of
minimal absolute value of non-zero costs) the split is not
performed. This leads to smaller trees and may give non-zero
'miscls' Classification error criterion.
To be used only for educational puposes because
it gives rather inferior results. Costs are naturally embedded.
CHISQSTOPVAL specificity = (1-significance level) of chi-squared test
on the difference between the original node class distribution
and branches class distributions. The null hypothesis is that
distribution does not chage after the split (in other words:
class labels and splitting are independent). If null hypotesis
is accepted then splitting is stopped and current node becomes
a leaf. CHISQSTOPVAL should be in the [0, 1] range.
E.g., CHISQSTOPVAL = 0.95 means that significance level is 5%
and there is only 5% chance of the incorrect rejection of the
null hypothesis that there is no change in class distribtion
after the split. (optional; default: 0, which means that
branches will be discarded only if they bring no change in class
PRUNE Pruning type name
'prunep' - pessimistic (top-down) pruning as defined by Quinlan.
Using 'prunep' option while trainin a tree for a dataset with
a defined cost matrix will cause an error.
'prunet' - test set (bottom-up) pruning using the (required)
These implementations of both pruning algorithms may be not
exactly correct if there are unknown values in datastes.
(optional; default: '' -- no prunning)
T Test test for the test set pruning
NFEAT The number of features to be randomly selected to find a split
at each node: to be used to created random forest.
If NFEAT ==  then all available features will be used.
(optional; default: )
| W|| Classifier mapping|
If (full grown) branches of (sub)tree do not improve classification error (misclassification cost) they are immediatley discarded. This may happen because we use regularised posteriors. As a result the algorithm is more stable, trees are smaller, but splits on the training set may be not perfect.
 J.R. Quinlan, Simplifying Decision Trees, International Journal of Man-Machine Studies, 27(3), pp. 221-234, 1987.  J.R. Quinlan, Improved use of continuous attributes in C4.5. Journal of AI Research, 4(96), pp. 77-90, 1996.
|This file has been automatically generated. If badly readable, use the help-command in Matlab.|