ct_intro: Introduction of clustering by ClusterTools

PRTools and ClusterTools should be in the path

A review of all Clustertools examples.

Download the m-file from here.


Prepare environment

prtime(10)                   % restrict iterative optimisation to 10s
delfigs                      % delete existing figures
randreset;                   % takes care of reproducability
prwarning(2)                 % show warnings

Define a dataset

we use some standard routines to create 8 two-dimensional clusters. After creation the label information is removed.

m = 1000;
a = gendatclust2(m);
x = +a; % remove labels
scattern(x); axis equal

This dataset consist of:

Find 8 clusters by five routines

First a separate coding per cluster routine is shown. In the bottom experiment a more compact coding is used.

labt = getnlab(a);
subplot(3,2,1); scatn(labt,x,'True Labels');
axis equal; drawnow

labk = clustk(x,8); e = clusteval(labk,a,'actl');
title = ['K-Means ' num2str(e,'%5.2f')];
subplot(3,2,2); scatn(labk,x,title);
axis equal; drawnow

labhs = clusth(x,8,'s'); e = clusteval(labhs,a,'actl');
title = ['Single Linkage ' num2str(e,'%5.2f')];
subplot(3,2,3); scatn(labhs,x,title);
axis equal; drawnow

labhc = clusth(x,8,'c'); e = clusteval(labhc,a,'actl');
title = ['Complete Linkage ' num2str(e,'%5.2f')];
subplot(3,2,4); scatn(labhc,x,title);
axis equal; drawnow

labm = clustm(x,8); e = clusteval(labm,a,'actl');
title = ['Mode Seeking ' num2str(e,'%5.2f')];
subplot(3,2,5); scatn(labm,x,title);
axis equal; drawnow

labe = cluste(x,8); e = clusteval(labe,a,'actl');
title = ['Exemplar ' num2str(e,'%5.2f')];
subplot(3,2,6); scatn(labe,x,title);
axis equal; drawnow
clear title
PR_Warning: EXEMPLAR: Examplar clustering updating stopped by PRTIME after 202 iterations

K-Means, Complete Linkage and Mode Seeking result in similar clusters, that are about spherical and of the same size. Single Linkage shows an interesting result as it finds the two bars as well as the two circles as single clusters, but it merges the four Gaussian blobs. In addition there are three single object clusters. Exemplar performs badly. Note that it generated a warning as its optimisation needed too much time.

Find 20 clusters by five routines

Coding here is more comapact than above. It uses the fact that the cluster routines are programmed as PRTools mappings.

k = 20;
w = {clustk(k) clusth(k,'s') clusth(k,'c') clustm(k) cluste(k)};
names = {'True Labels','K-Means','Single Linkage','Complete Linkage', ...
         'Mode Seeking','Exemplar'};
for i=1:6
  % for i==1 a trick is used to get ClusterTools labels from dataset
  if i == 1, lab = getnlab(a)*clust2proto(a); e = 0;
  else lab = x*w{i-1}; e = clusteval(lab,a,'actl'); end
  title = [names{i} ' ' num2str(e,'%5.2f')];
  subplot(3,2,i); scatn(lab,x,title); markcols(1);
  axis equal; drawnow
PR_Warning: EXEMPLAR: Examplar clustering updating stopped by PRTIME after 204 iterations

As there are more clusters than in the first experiment (20 instead of 8) for more objects the true labels are used. These are assigned to all objects in the corresponding cluster, resulting in a better perforemance. Exemplar now performs well. Frequently it is good (but very time consuming) for larger datasets and more clusters.