lcurves

Learning curves for Bayes-Normal, Nearest Mean and Nearest Neighbour on the Iris dataset. Averages over 100 repetitions.

PRTools and PRDataFiles should be in the path

Download the m-file from here. See http://37steps.com/prtools for more.

Contents

Show datasets and best classifier

delfigs
randreset(1)
a = iris;
a = setprior(a,0);
scattern(a*pcam(a,2));
title('Projection of the Iris dataset')
fontsize(14);

Show learning curve of qdc with apparent error

figure
prwarning off;
e = cleval(a,qdc,[6 8 10 14  20 30 40],100);
plote(e,'nolegend')
legend('Test error','Apparent error')
title('Learning curve Bayes-normal (QDA) on Iris data')
ylabel('Averaged error (100 exp.)');
fontsize(14);
axis([2.0000 40.0000 0 0.1200])

The two curves approximate each other to the performance of the best possible QDA model.

Show learning curves of nmc and k-nn

e2 = cleval(a,nmc,[2 3 4 5 6 8 10 14  20 30 40],100);
e3 = cleval(a,knnc,[2 3 4 5 6 8 10 14  20 30 40],100);
figure;
plote({e2,e3,e},'nolegend','noapperror')
title('Learning curves for Iris data')
ylabel('Averaged error (100 exp.)');
legend('Nearest Mean','k-Nearest Neighbor','Bayes Normal')
fontsize(14);
axis([2.0000 40.0000 0 0.1200])

For small training sets more simple classifiers are better. More complicated classifiers (more parameters to be estimated) are expected to perform better for larger training sets. The k-NN classifier improves slowly, but is expected to beat Bayes Normal at some point as it asymptotically approximates the Bayes classifier.

The learning curve for the Nearest Mean classifier shows surprisingly a minimum, a phenomenon discussed by Marco Loog et al.