DisTools examples: Generalized Dissimilarities

Instead of a representation by dissimilarities between objects, distances to models may be used. Some simple examples will be treated. It is assumed that readers are familiar with PRTools and will consult the following pages where needed:

 

Possibilities for computing models on the training set are cluster analysis and the computation of subspaces. After a cluster analysis objects my be represented by some distance measure defined for clusters, e.g. the minimum, teh maximum or the mean of the distances to all objects in a cluster. Alternatively the cluster may be represented by a central point or a subpace.

Exercise

  1. Take a dissimilarity dataset, e.g. one of the chickenpieces datasets.
  2. Compute as a baseline approach its learning curve for the 1-NN rule in dissimilarity space (use clevald and knnc).
  3. Cluster the training set, e.g. by a routine that can use dissimilarities as inputs, e.g. kcentres, modeseek or hclust.
  4. Compute a cluster based dissimilarity matrix by computing fro every object the distance to the cluster.
  5. Compute for som classifiers learning curves for the new representation and compure with the baseline approach.
  6. Repeat for various numbers of clusters.

elements: datasets datafiles cells and doubles mappings classifiers mapping types.
operations: datasets datafiles cells and doubles mappings classifiers stacked parallel sequential dyadic.
user commands: datasets representation classifiers evaluation clustering examples support routines.
introductory examples: Introduction Scatterplots Datasets Datafiles Mappings Classifiers Evaluation Learning curves Feature curves Dimension reduction Combining classifiers Dissimilarities.
advanced examples.

 

Print Friendly, PDF & Email