ClusterTools Contents

ClusterTools User Guide

clustk

CLUSTK

Feature space centroid-based clustering with K prototypes:

kmeans, kcentres and kmedoids

    LAB = CLUSTK(A,K,TYPE,R,MSIZE)
    LAB = A*CLUSTK(K,TYPE,INIT,R,MSIZE)

Input
 A Feature based dataset with M objects, possibly doubles.
 K Vector with desired numbers of clusters, default sampling of [2:M]
 TYPE 'kmeans' (default), 'kcentres' or 'kmedoids'.
 INIT Vector of length max(K), indices of initial centres or medoids.
 INIT = []: default: systematic initialisation by CLUSTF.
 R Number of clustering trials based on random initialisations.  The best cluster result is returned.
 MSIZE Number of objects (M) above which the dataset is preclustered by
 CLUSTM, reducing it to MSIZE objects. Default MSIZE = 5000. Use
 MSIZE = inf to avoid perclustering.

Output
 LAB M*NUMEL(K) array with the results of the multilevel clustering for  the M objects. The columns refer to the clusterings. They yield for  the objects the prototype indices of the clusters they belong to.

Description

An intial set of K prototypes is iterativly optimised such that the  set of objects with the same nearest prototype (a 'cluster') constitutes  this protoype exactly as its mean, medoid or centre. The medoid is  defined as that object in a cluster for which the mean distance to all  other objects is minimum. The centre is defined as that object in the a  cluster for which the maximum distance to all other objects is minimum.

The clustering is iterated untill stability or is prematurely stopped by  PRTIME. In case of random initialisations (R is an integer > 0) the  clustering is repeated R times and the best result is returned. For R = 0 (special case) a systematic initialisation is performed and the resulting  clustering is directly returned without optimisation.

LAB is a column vector of length M or an array of length(K) columns. It  contains for every object and for every clustering the cluster indices.  In case of kcentres or kmedoids they point to the objects that are found  as the centres or prototypes. In case of kmeans they point to the objects  nearest to the cluster means.

If K is given its values are reduced to less than M/5 to make the routine  more feasible. Moreover, if M > MSIZE the dataset A is preclustered by  PRECLUST using CLUSTM. Unless specific values of K < 100 are needed it is  recommended for fast processing to use K = []. Speed may be further  increased by using smaller values of MSIZE, e.g. MSIZE = 500;

Example(s)

 randreset;                     % take care of reproducability
 data = gendatclust1(20000);    % generate 20000 objects in 10 clusters
                                % Run Mean Shift clustering
 lab = clustk(data,[2 5 10 18 30 50 100],'kmeans',[],2000);
                                % Show scatterplot for 10 clusters
 figure; scatn(lab(:,3),data,'K-Means'); 
 figure; clusteval(lab,data);   % Evaluation by active learning

See also

datasets, mappings, dclusth, cluste, clustf, clusth, clustkh, clustm, reclustn, preclust, clusteval, clustcerr, clustc,

ClusterTools Contents

ClusterTools User Guide

This file has been automatically generated. If badly readable, use the help-command in Matlab.