DisTools examples: Chickenpieces

Some exercises are defined on the basis of the Chickenpieces dataset. It is assumed that readers are familiar with PRTools and will consult the following pages where needed:

Some papers on the Chickenpieces dataset:

H. Bunke, H., U. Buhler, Applications of approximate string matching to 2D shape recognition, Pattern Recognition 26 (1993) 1797-1812.

B. Spillmann, Description of the Distance Matrices, Internal report, Computer Vision and Artificial Intelligence (FKI), Institute of Computer Science and Applied Mathematics, University of Bern, 2004.

E. Pekalska, A. Harol, R.P.W. Duin, D. Spillman, and H. Bunke, Non-Euclidean or non- metric measures can be informativePoc. SSSPR 2006, LNCS 4109, Springer, 2006, 871-880.

R.P.W. Duin and E. Pekalska, Non-Euclidean Dissimilarities: Causes and InformativenessPoc. SSSPR 2010, LNCS  6218, Springer, 2010, 324-333.

First we load all 44 dissimilarity matrices and compute for each of them the LOO 1NN classification error and the negative eigenfraction as a measure for the non-Euclidianess.

D = chickenpieces('all');
norm = [5 7 10 15 20 25 29 30 31 35 40];
cost = [45 60 90 120];
E = zeros(size(D));
F = zeros(size(D));
for i = 1:size(D,1), for j=1:size(D,2)
E(i,j) = nne(D{i,j});
F(i,j) = nef(D{i,j}*makesym*pe_em);
end, end

Next the classification errors are plotted as a function or the norm.

figure;
h = plot(norm,E);
set(h,'linewidth',2)
set(gca,'fontsize',12)
ylabel('Error')
xlabel('Norm')
title('1NN Error for chickenpieces dissimilarities')
legend('cost 45','cost 60','cost 90','cost 120')

Finally the NEF is plotted as a function of the norm.

figure;
h = plot(norm,F);
set(h,'linewidth',2)
set(gca,'fontsize',12)
ylabel('NEF')
xlabel('Norm')
title('Negative eigen fraction for chickenpieces dissimilarities')
legend('cost 45','cost 60','cost 90','cost 120')
showfigs

Note that the best results correspond with dissimilarity measures with a rather high NEF.

Exercises

  1. Compute the learning curve for the 1NN classifier of the given dissimilarities for norm = 29 and cost = 45 using nnerror2. Select a training set size for which you want to beat the !NN performance by a dissimilarity based classifier.
  2. Find a classifier in dissimilarity space that beats the above selected performance. Is the result significant in the statistical sense?
  3. Find a classifier in PE space that beats the above selected performance. Is the result significant in the statistical sense?
  4. Is it useful to use a transductive approach (include the test set in the construction of the representation?)
  5. Try to find a classifier based on more or on all dissimilarity matrices that beats, for the same training set size,  your result based on a single dissimilarity matrix.

elements: datasets datafiles cells and doubles mappings classifiers mapping types.
operations: datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands: datasets representation classifiers evaluation clustering examples support routines.
introductory examples: Introduction Scatterplots Datasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reduction Combining classifiers Dissimilarities.
advanced examples.

 

Print Friendly, PDF & Email