Why does the logdens command improve classification?

Note that this question is not so relevant anymore for PRTools4.2.0 and later as the logdens routine is now always automatically called when applicable in case classc is applied to the classifier (i.e. if posteriors are used instead of densities). Not all help files are updated yet.

Classifiers following the Bayes classification rule may profit from using log-densities instead of densities if they are based on normal distributions. This improvement is not fundamental It is just a computational trick that overcomes some limitations of the finite word length of computers. Here is a short explanation.

The multi-class  Bayes classifier  between classes \omega \in \Omega can be written as (see the glossary)

\hat{\omega}(x) = argmax_{\omega \in \Omega}{{P(x|\omega)P(\omega)}}

The result (the class with the largest posterior probability) does not change if in the argument of the argmax function a positive monotonic transformation is included. Let us take the logarithm:

\hat{\omega}(x) = argmax_{\omega \in \Omega}{\log(P(x|\omega)P(\omega))}

For normal distributions with P(x|\omega) = C_\omega \exp(-\frac{1}{2} (x-\mu_{\omega})^T {\Sigma_\omega}^{-1} (x-\mu_{\omega})) this is equivalent to

\hat{\omega}(x) = argmax_{\omega \in \Omega}{(x-\mu_{\omega})^T {\Sigma_\omega}^{-1} (x-\mu_{\omega}) + D_\omega}

as the logarithm cancels the exponent and all constants which are independent of x can be collected in a single constant D_\omega.

The above shows that the logarithmic formulation of the Bayes classifier is equivalent to the original one. The numeric implementations, however, may give different results in high-dimensional spaces. PRTools tries to compute proper densities in the procedures based on the Bayes classifier. So +testset*qdc(trainset) shows the densities of the objects in testset estimated from trainset.

In high-dimensional spaces these densities however can become very small. Due to the finite word length the density estimates based on exponents may become identical (at the end even zero) for different classes. Objects are thereby not optimally classified. Avoiding the exponent can be profitable in the tails of the distributions. This also holds for the density estimates based on sums of exponents like in mogc and parzenc. Formally the logarithm does not cancel the exponents in a sum of exponents. In practice however the contribution of a single exponent dominates in the tail of the total distribution. All others can thereby be neglected.

In PRTools density based classifiers can be called in two modes: without or with classc. In the first case proper densities are estimated and using the logarithm would spoil this. In the second case classc takes care that posteriors are computed instead of densities. The computation of

P(\omega|x) = \exp(\log(P(x|\omega)P(\omega) - \log(P(x)))

instead of

P(\omega|x) = \frac{(P(x|\omega)P(\omega)}{P(x)}

is included in the call to classc for the recent versions of PRTools. Users don’t have to call logdens themselves if they call classc. The example prex_logdens shows the difference between classifiers without and with logdens that are not based on classc.

Print Friendly, PDF & Email