DisTools introductory example, pseudo-Euclidean embedding

Get rid of old figures. Take a dataset (other examples are kimia_shapes, chickens and protein)

delfigs
A = catcortex

Now we find an embedding into a pseudo-Euclidean space. This is only possible if A is symmetric and has a zero diagonal. The mapping is found by pe_em. The dimensionalities of the positive and negative subspaces can nbe retrieved by signature and the dataset is mapped into the PE-Space.

W = A*pe_em;
sig = signature(W);
X = setname(A*W,'PE Space');

The dimensions (features) of X are ranked that X(:,1:sig(1)) contains the positive space starting with the most significant direction (largest eigenvalue) and X(:,sig(1)+1:sig(1)+sig(2)) contains the negative space, starting with the most significant direction (most negative eigenvalue). A scatterplot of these two most significant dimensions (most positive and most negative) can thereby be made as follows.

scatterd(X(:,[1 sig(1)+1]));
title('Pseudo Euclidean Space')
xlabel('First positive dimension')
ylabel('First negative dimension')

Compare with the first two positive directions:

figure;
scatterd(X(:,[1 2]));
title('Pseudo Euclidean Space')
xlabel('First positive dimension')
ylabel('Second positive dimension')

Classifiers in in PE-space should use its specific definition of distances. For some classifiers this is possible, e.g. nmc, knnc and parzenc. The pseudo-Euclidean variants of the last two classifiers however, should be identical to the corresponding classifiers for the dissimilarity data itself, knndc and parzenddc if these representations are based on the same objects. Normal density based classifiers (ldc,udc and qdc)  in PE-space do not depend on the signature as it cancels in the computation of squared distances. The interpretation of normal densities in pe-spaces is, however, disputable as such densities have no proper definitions in such spaces.

In the following experiment these classifiers are compared for various spaces. the associated psace ('ass'), the positive space ('pos') and the negative space ('neg') are all Euclidean. The pseudo-Euclidean classifiers (pe_nmc, pe_knnc, pe_parzenc)  are for these spaces identical to the original variants (pe_nmc, pe_knnc, pe_parzenc)

Define the classifiers:

U = {pe_nmc,ldc,pe_knnc,pe_parzenc}

Define the mappings to the subspaces

Wass = euspace(W,'ass');
Wpos = euspace(W,'pos');
Wneg = euspace(W,'neg');

Map the original dissimilarities to these subspaces and name the resulting datasets properly

Xass = setname(A*Wass,'Ass Space');
Xpos = setname(A*Wpos,'Pos Space');
Xneg = setname(A*Wneg,'Neg Space');

Run a crossvalidation for all datasets on all classifiers

crossval({X,Xass,Xpos,Xneg},U,5,5)

Compute for every space a set of learning curves as well:

figure;
e = cleval(A*W,U,[],5);
plote(e);
title('PE Space');

figure;
e = cleval(A*Wass,U,[],5);
plote(e);
title('Associated Space');

figure;
e = cleval(A*Wpos,U,[],5);
plote(e);
title('Positive Space');

figure;
e = cleval(A*Wneg,U,[],5);
plote(e);
title('Negative Space');

Finally, show all figures:

showfigs

Return to DisTools Introductory Examples
Print Friendly, PDF & Email