Every problem has its own best classifier. Every classifier has at least one dataset for which it is the best. So there is no end to pattern recognition research as long as there are problems that are at least slightly different from all other ones that have been studied so far.

The reason for this is that every training rule is based on some explicit or implicit assumptions. For datasets that exactly fulfill these, the classifier is the best one. Classifiers that are more general, i.e. make less assumptions, need more training examples to compensate for the lack of prior knowledge. Classifiers that make assumptions that are not true, perform worse.

The above is illustrated by one of the solutions submitted to the 2010 classification competition of the ICPR. About 20 standard PRTools classifiers are studied over 300 datasets. It appeared that every classifier in the set was the best one for at least one of the datasets. This showed that the collection of datasets defined for this competition reflected sufficiently well the set of problems PRTools was designed for.

In short: there is no such thing as the best classifier. This has only a meaning for a particular problem or for a sharp defined set of problems.

Consequences for writing and reviewing papers

The above may seem obvious and is accepted by many researchers in the field. However, it is often not recognized by the present culture of writing and reviewing papers on proposals for new or modified classifiers. As every classifier has its own problems for which it is the best, it is no surprise that they exist. The interesting point is to characterize such problems, which is not always easy. One way to do this is to create a well defined artificial problem for which the classifier is optimal. This problem should exactly fulfill the appropriate conditions and assumptions.

In addition it is of interest to the community to present at least a single real world example for which it is the best as well. This will proof that such problems exist and that the proposal has practical significance. Showing the performance over a large set of public domain datasets is not really needed, but might be of interest. The average performance over such a set has no meaning, as it is based on an arbitrary collection of problems that will not coincide with any real world environment.

Finally, the characteristics of the classifier may be illustrated in addition by showing examples for which it definitely fails. Such examples should exist, as in some applications the classifier assumptions are not fulfilled at all. Finding or creating them makes further clear how to position the proposed classifier.

All comparisons should be made with published classifiers that in nature are close to the one under study, and preferably with some general standard classifier too.

Summary

  1. Make assumptions and conditions for the proposal explicit as good as possible.
  2. Illustrate the superior performance with at least one well defined artificial example.
  3. Find a real world problem in which the classifier performs best.
  4. Find a well defined problem in which the classifier performs badly.
  5. Base all comparisons on classifiers that, in design, are close to the proposal.

 

Print Friendly, PDF & Email
Why is the nearest neighbor rule so good?
The ten Aristotelian categories, features and dissimilarities

Filed under: Classification