combining classifiers

Introduction of stacked and parallel combining, fixed and trained combiners. It hase to be realized that a combined a set of classifiers can only be better than the best of the constituting base classifiers if this set does not contain an 'optimal' classifier that can solve the problem by its own.

PRTools and PRDataSets should be in the path

Download the m-file from here. See http://37steps.com/prtools for more.

Contents

delfigs;
randreset;    % for reproducability
reps = 10;    % repeat experiment reps times.

stacked combining

Three simple classifiers are combined for a 10-class (digits) 64-dimensional problem. Learning curves are computed for the three classifiers and three combiners, two fixed (mean and product) and one trained (fisher).

r = {nmc ldc qdc([],[],1e-2)}; % three untrained base classifiers
u = [r{:}]*classc; % untrained stacked combiner
a = mfeat_kar;     % the dataset, 10 x 200 objects in 64 dimensions
trainsize = [0.07 0.1 0.15 0.2 0.3 0.5 0.7 0.9999]; % fractions
e = zeros(reps,numel(trainsize),numel(r)+3);
for n=1:reps
  [tt,s] = gendat(a,0.5);          % 50-50 split in train and test set
  for i=1:numel(trainsize)
    randreset(n);                  % takes care that larger training sets
    t = gendat(tt,trainsize(i));   % include the smaller ones.
    w = t*u;                       % train combiner
    for j=1:numel(w.data)          % test base classifiers
      e(n,i,j) = s*w.data{j}*testc;
    end
    e(n,i,j+1) = s*w*meanc*testc;  % test mean combiner
    e(n,i,j+2) = s*w*prodc*testc;  % test product combiner
    e(n,i,j+3) = s*(t*(w*fisherc))*testc; % train and test fisher combiner
  end
end
% plot and anotate
tsizes = ceil(trainsize*size(t,1));
figure; plot(tsizes,squeeze(mean(e)));
set(gca,'xscale','log');
axis([min(tsizes),ceil(max(tsizes)),0.03,0.15]);
set(gca,'xtick',tsizes)
names = char(char(getname(r)),char('mean combiner','product combiner','fisher combiner'));
legend(names);
xlabel('training set size');
ylabel(['Average error ( ' num2str(reps) ' exp)']);
title(['Stacked combining on ' getname(a)])
fontsize(14);
linewidth(2);

The combiners perform almost everywhere better than the base classifiers. The product rule needs more accurate confidences and thereby a larger training set. The fisher combiner, in fact optimizing a weighted version of the mean combiner, needs more data as well.

parallel combining

A simple classifier is trained for three aligned feature sets of a 10-class (digits) 64-dimensional problem. Learning curves are computed for the three classifiers and three combiners, two fixed (mean and product) and one trained (fisher).

a = {mfeat_kar; mfeat_zer; mfeat_fac}; % three feature sets
classf = qdc([],[],1e-2)*classc;       % the classifier, regularized qdc
trainsize = [0.07 0.1 0.15 0.2 0.3 0.5 0.7 0.9999]; % fractions
e = zeros(reps,numel(trainsize),numel(r)+3);
for n=1:reps
  [tt,s] = gendat(a,0.5);          % 50-50 split in train and test set
  for i=1:numel(trainsize)
    randreset(n);                  % takes care that larger training sets
    t = gendat(tt,trainsize(i));   % include the smaller ones.
    w = t*classf;                  % train the classifier for 3 feature sets
    for j=1:numel(a)               % test them
      e(n,i,j) = s{j}*w{j}*testc;
    end
    w = parallel(w);
    e(n,i,j+1) = [s{:}]*[w]*meanc*testc;  % test mean combiner
    e(n,i,j+2) = [s{:}]*[w]*prodc*testc;  % test product combiner
    e(n,i,j+3) = [s{:}]*([t{:}]*(w*fisherc))*testc; % train and test fisher combiner
  end
end
% plot and annotate
tsizes = ceil(trainsize*size(t{1},1));
figure; plot(tsizes,squeeze(mean(e)));
set(gca,'xscale','log');
axis([min(tsizes),ceil(max(tsizes)),0.01,0.35]);
set(gca,'xtick',tsizes)
names = char(char(getname(a)),char('mean combiner','product combiner','fisher combiner'));
legend(names);
xlabel('training set size');
ylabel(['Average error ( ' num2str(reps) ' exp)']);
title(['Parallel combining on MFEAT'])
fontsize(14);
linewidth(2);

The three dataset perform rather differently. For sufficiently large training sets the combiners perform better.

Show that trained combining is not useful here

axis([400 1000 0.02 0.06])

This is the zoomed version of the large training set sizes. It shows that for the trained fisher combiner the training set is still not sufficiently large and does not improve the results of the best feature set.