DisTools examples: Dimension reduction of the dissimilarity space

The dissimilarity space may have a very large dimensionality. This may cause severe computational problems. Here some techniques for dimension reduction are shown. It is assumed that readers are familiar with PRTools and will consult the following pages where needed:

Intuitively one expects that more objects available for training a classifier will result in a better classification performance. This is not true for a number of pattern recognition algorithms. The dissimilarity approach suffers in at least two ways from large training sets. If all training objects are used for representation as well, the dissimilarity matrix grows in two directions when more objects are added: a larger set of examples in represented in a vector space with a growing dimensionality. Several procedures may thereby suffer from the curse of dimensionality (overtraining or peaking). In addition there is a computational problem: the number of dissimilarities to be computed increases with the square of the number of objects. The memory demands grow in the same way.

It has to be doubted whether the dimensionality should really grow as fast as the number of objects. Problems have a finite intrinsic dimensionality. The clouds of objects represented in dissimilarity space will not necessarily be in linear subspaces, so some growth of dimensionality may always be profitable, but if this growth is slower than the number of objects the accuracy of the density estimates or classifiers will still grow.

A very nice property of the dissimilarity representation is that, unlike features, dissimilarities with objects behave very similar if a representation object is exchange for another, similar one: scaling and densities will hardly be affected. For this reason a random selection of objects makes sense. In particular if larger sets are considered.

D = chickenpieces(15,60);
D = setprior(D,getprior(D)); % set class priors to class frequencies
e = clevald(D,{knndc,knnc,fisherc},[1,2,3,5,7,10,15,20,30,50],[],10)
plote(e,'noapperror')

In this example some learning curves are computed. Training sets are selected at random. Representations sets are by default equal to the training sets and thereby also random.

Several systematic selections are possible and have been studied in the literature: the use feature selection procedures, cluster analysis, a careful uniform distribution of objects, just selecting objects in the tails of the distributions, far away from the decision boundary, close to the decision boundary, etcetera. So any routine for reducing a set of objects to a smaller set, as well as for reducing a set of features may be studied. Here are some examples.

U1 = protselfd('random'); U1 = setname(U1,'random');
U2 = protselfd('LOO'); U2 = setname(U2,'LOO NN');
U3 = protselfd('maxdist'); U3 = setname(U3,'KCentres');
U4 = pcam;
U5 = featseli('NN'); U5 = setname(U5,'IndFeatSel');
U = {U1,U2,U3,U4,U5};
repeats = 5;
gendata = genddat([],0.5);

e = clevalfs(D,U,fisherc,[1,2,3,5,7,10,15,20,30],gendata,repeats);
plote(e,'noapperror')

Exercise

  1. Use a feasible subset of the polydism57 dataset to run the above experiment many times to find a stable set of curves.
  2. Compare the final results with the LOO NN error on the given dissimilarities (use nne).
  3. Is the PCA result useful?

elements: datasets datafiles cells and doubles mappings classifiers mapping types.
operations: datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands: datasets representation classifiers evaluation clustering examples support routines.
introductory examples: Introduction Scatterplots Datasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reduction Combining classifiers Dissimilarities.
advanced examples.

 
Print Friendly, PDF & Email