ClusterTools Contents

ClusterTools User Guide



Evaluate clusterings by various performance measures


 LABC Index array, size [M,N], indices of cluster prototypes for M objects in N clusterings.
 LABT Double array, size [M,1] with true object labels.
 A PRTools dataset used for obtaining LABC by some clustering  procedure.
 TYPE String with desired performance measure, see below.

 E Evaluation result using the performance measure give by TYPE.  See below for possibilities.


Computation of a set of cluster performance measures between estimated  cluster labels LABC and true object labels LABT. In case LABC is a  multilevel clustering (N>1) the result E is a structure ready to be  plotted by PLOTE. If E is omitted (no output) the result is directly  plotted. Performance measures that do not generate a curve (see below)  are plotted on the screen.

In case LABC is a single clustering (N==1) just scalar results are  returned (except for TYPE is 'roc', which generates two values, see  CLUSTROC). In case TYPE is omitted the default measure for multilevel  clustering is 'actl'. For a single clustering all measures are returned  in a structure or printed on the screen.

It is assumed that the cluster labels LABC are indices to cluster  prototypes with true labels as given by the correponding entries in LABT.  The true labels LABT can be derived as doubles from a PRTools dataset A by LABT = GETNLAB(A);

The following performance measures are available;

  • Relative operating characteristics, see CLUSTROC.
  • Scalar, area under the ROC curve [e1,e2].
  • Scalar, MIN(E1+E2) with E1 and E2 the two errors of the ROC.
  • Classification error of assigning all objects to the true class  of the cluster prototypes (active learning).
  • Classification error based on the true class of the cluster  prototypes and combining multilevel cluster confidences.
  • Classification error based on the true class of the cluster  prototypes after nested combining cluster levels, see RECLUSTN
  • The adjusted Rand index, see Wikipedia.  It is between 0 and 1 and 1 for consistent clusterings.
  • Normalised mutual information between 0 and 1.

See also

datasets, mappings, knnc, cluste, clusth, clustk, clustkh, clustm, clustf, clustr, dcluste, dclustf, dclusth, dclustk, dclustm, dclustr, reclustn, clustcerr, clustc, clustnum, clustroc, plote,

ClusterTools Contents

ClusterTools User Guide

This file has been automatically generated. If badly readable, use the help-command in Matlab.