DisTools introductory example, the dissimilarity space

Get rid of old figures. Take a dataset (other examples are kimia_shapes, chickens and protein)

delfigs
A = catcortex

Now we interpret this dataset as a set of objects represented by vectors (the rows of A) in an Euclidean space, the dissimilarity space. The axes are constructed by the given dissimilarities to other or the same objects (the columns). This set of objects is called the representation set. Here all objects are initially used for representation.

For not very small representation sets the dissimilarities to the representation objects are usual highly correlated. One way to get rid of this correlation is by PCA. Let us look at the first two components:

scatterd(A*pca(A,2));
title('PCA');

The total set of eigenvalues gives an impression of the correlations. The cumulative relative fractions of the ranked eigenvalues can be computed and plotted by:

figure:
plot(pca(A,0));
title('Cumulatice eigenvalue fraction')

Replace plot by semilogx for larger datasets. A nonlinear projection of the data can be made by multidimensional scaling:

figure;
options.st = 0;
scatterd(A*mds(A,2,options));
title('MDS')

An estimate of the intNow we will perform some classification experiments. First split the dataset in a trainset AT and a testset AS. The routine that is used genddat, has the property that it reduces by default the representation set to the trainset, both for the trainset as well as for the testset.

[AT,AS] = genddat(A,0.5);
AT = setname(AT,'TrainSet')
AS = setname(AS,'TestSet')

Now a set of untrained classifiers U is defined, trained by AT and evaluated by AT and AS:

U = {nmc,ldc,loglc,knnc};
W = AT*U;
testc({AT,AS},W)

The standard cross-validation routine crossval may be used for dissimilarity data as well. So 3 repeats for 5-fold cross-validation looks like:

crossval(A,U,5,3)

In this way, however, the entire dataset is always used for representation. It is more natural to reduce to representation set to the training set or to a random fraction of it. In the below example it is reduced to a random set of 20 objects from the training set.

crossvald(A,U,5,20,10)

Learning curves may be computed in a similar way. Here we choose for representation sets that equal the training set and average over 3 repetitions.

e = clevald(A,U,[],[],3);
figure;
plote(e,'noapperror')

Instead we may decide for a representation set that is just a random 20% of the training set

e = clevald(A,U,[],0.2,3);
figure;
plote(e,'noapperror')

Finally, make all figure visible.

showfigs

Return to DisTools Introductory Examples Continue with next DisTools Example
Print Friendly, PDF & Email