DisTools examples: Classifiers in pseudo-Euclidean space

Not all classifiers can be computed in a pseudo-Euclidean space. Some examples will be discussed. It is assumed that readers are familiar with PRTools and will consult the following pages where needed:

Classifiers based on distances can be defined in a PE space as distances in this space are well-defined. There are some routines available to facilitate this.

D = flowcytodis(1);
X = D*(pe_em*mapex);
prcrossval(X,{pe_knnc,pe_parzenc,pe_nmc},2,5)

Density based classifiers are not yet well defined for a PE-space. However, procedures based on means and covariance matrices can still be used as in the computation of the covariances the signature cancels. The interpretation of normal distributions is not valid as such distributions are not defined for a PE-space.

D = flowcytodis(1);
X = D*(pe_em*mapex);
prcrossval(X,{ldc,udc,qdc},2,5)

Explain why udc and qdc yield the same result.

The support vector classifier is based on semi-positive definite kernels. Kernels in PE space, however, are indefinite. Nevertheless some implementations may for some problems still yield good results. DisTools offers two PE support vector classifiers: pe_svc and pe_libsvc. The latter is based on PRTools libsvc and needs the libsvm package in the path.

D = flowcytodis(1);
X = D*(pe_em*mapex);
prcrossval(X,{pe_svc,pe_libsvc,svc,libsvc},2,5)

svc and libsvc do not compute pe-kernels, but compute the semi-positive definite kernels in the associated space. They may do equally good or even better than the PE versions. However, none of them is optimal in the sense that the margin in PE-space is maximized.

Exercise

In the above example train sets and test sets are generated in PE space. Their representation is based on the total given dissimilarity matrix.

  1. Perform for some PE classifiers a cross-validation experiment that first splits the data and computes the PE space from the training set alone.
  2. Compute for one of the chickenpieces datasets learning curves that compare the two approaches: PE spaces computed from the total dataset with PE spaces determined by the train set alone.

elements: datasets datafiles cells and doubles mappings classifiers mapping types.
operations: datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands: datasets representation classifiers evaluation clustering examples support routines.
introductory examples: Introduction Scatterplots Datasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reduction Combining classifiers Dissimilarities.
advanced examples.

 

Print Friendly, PDF & Email