ClusterTools Contents

ClusterTools User Guide

modeclustf

MODECLUSTF

Fast KNN mode-clustering, based on overlaping cells

    [LAB,NNLAB,NDIST] = MODECLUSTF(A,C,K,NEST)

Input
 A Dataset of M objects.
 C Integer, complexity parameter 1 <= C <= M, default C = 2.
 K Vector with neighborhood sizes of interest, default is a  geometric series between 1 and M.
 NEST Logical, if TRUE the output set of clusterings (columns of LAB) will be made nested by RECLUSTN. Default: TRUE.

Output
 LAB Indices of mode samples, size [M,N] with N the number of  clusterings (NUMEL(K)).
 NNLAB [M,1} vector with indices of nearest neighbors.
 NDIST Total number of distance calculations.

Description

This is a fast version of MODECLUST, useful and essential for very large  dataset (more than a million objects). It makes for every object a rough  estimate of the potential set of nearest neighbors (which should be  larger than max(K)). This set is larger for larger values of C, resulting  in a slower, but more accurate procedure. In many practical problems it  appeared that C = 6 was sufficient.

The computational complexity of this algorithm (number of distances that  are actually computed) is M x SQRT(M). However, for small M it is not  faster than MODECLUST. Therefor, it will jump to that routine for small  values of M as well as for C == 0.

Reference(s)

R.P.W. Duin and S. Verzakov, Fast kNN mode seeking clustering applied to active learning, arXiv:1712.07454, 2017, 1-23.

See also

mappings, datasets, distm, proxm, clustm, dclustm, modeclust_batch, modeclust, reclustn,

ClusterTools Contents

ClusterTools User Guide

This file has been automatically generated. If badly readable, use the help-command in Matlab.