
Examples of various feature selection procedures, organised per classifier

PRTools and PRDataSets should be in the path

Feature curves are shown for 6 feature rankings computed by 3 procedures:

and 2 criteria:

These criteria are computed for the entire training set. Each of the 6 plots shows the performance of one of three classifiers for 3 ranking procedures in a comparison with the original ranking. Classifier performances are based on a 50-50 random split of the dataset for training and testing. The three classifiers are:

Show dataset

The Breast Wisconsin dataset is based on 9 features and 683 objects in two classes of 444 and 239 objects.

a = breast;
a = setprior(a,0);
title(['PCA projection of the ' getname(a) ' dataset'])

Define classifiers

w1 = setname(knnc([],1),'1-NN');
w2 = setname(fisherc,'Fisher');
w3 = setname(libsvc,'LibSVC-1');
w = {w1,w2,w3};
nreps = 25;

Define feature selectors using Mahalanobis distance

define unit mapping

v0 = setname(prmapping,'Original ranking');
% individual selection
v1 = setname(featseli(a,'maha-s',size(a,2)),'Individual Selection');
% forward selection
v2 = setname(featself(a,'maha-s',size(a,2)),'Forward Selection');
% backward selection
[v3,r] = featselb(a,'maha-s',1);
v3 = setname(featsel(size(a,2),[+v3 abs(r(2:end,3))']),'Backward Selection');
v = {v0,v1,v2,v3};

Compute feature curves per classifier ranked for Mahalanobis distance

for j=1:numel(w)
  e = cell(1,numel(v));
  for i=1:numel(v)
    randreset; e{i} = clevalf(a*v{i},w{j},[],0.5,nreps);
    e{i}.names = getname(v{i});
  figure; plote(e)
  title(['Feature curves for ' getname(w{j}) ', based on Mahalanobis distance']);
  set(gca,'xticklabel',1:size(a,2)); set(gca,'xtick',1:size(a,2));

Define feature selectors using NN performance

define unit mapping

v0 = setname(prmapping,'Original ranking');
% individual selection
v1 = setname(featseli(a,'NN',size(a,2)),'Individual Selection');
% forward selection
v2 = setname(featself(a,'NN',size(a,2)),'Forward Selection');
% backward selection
[v3,r] = featselb(a,'NN',1);
v3 = setname(featsel(size(a,2),[+v3 abs(r(2:end,3))']),'Backward Selection');
v = {v0,v1,v2,v3};

Compute feature curves per classifier ranked for NN performance

for j=1:numel(w)
  e = cell(1,numel(v));
  for i=1:numel(v)
    randreset; e{i} = clevalf(a*v{i},w{j},[],0.5,nreps);
    e{i}.names = getname(v{i});
  figure; plote(e)
  title(['Feature curves for ' getname(w{j}) ', based on NN performance']);
  set(gca,'xticklabel',1:size(a,2)); set(gca,'xtick',1:size(a,2));