Over Confidence in Naive Bayes

Sep 10, 2017 • naive bayes - generative models - classification • 2

Problem

One problem with the Naive Bayes classifier is that the confidence of the model in its predictions can change even when no new information has been gained.

Let $p(x_i \mid y)$ for $i = 1, \ldots, m$ and $p(y)$ be the probability distributions defining a Naive Bayes model. There are $m$ features and $y$ can take on $n$ possible classes. Assume the the prior over labels $p(y)$ is uniform.

Show that if every feature is repeated exactly once, the model becomes more confident in its predicted output class for an arbitrary example.

Solution

show

Let $j$ be the index of the class label that the Naive Bayes model predicts for a given example. In this case

\begin{align} \prod_{i=1}^m p(x_i \mid y = j)p(y=j) > \prod_{i=1}^m p(x_i \mid y = k)p(y=k) \quad \text{for} \quad k \ne j. \end{align} Since the prior is uniform we have \begin{align} \texttt{(1)} \quad \prod_{i=1}^m p(x_i \mid y = j) > \prod_{i=1}^m p(x_i \mid y = k) \quad \text{for} \quad k \ne j. \end{align}

Now consider probability given to the same label $j$ by the model with every output feature repeated:

\begin{align} &\frac{\prod_{i=1}^m p(x_i \mid y = j)^2 p(y=j)}{\sum_{k=1}^n \prod_{i=1}^m p(x_i \mid y = k)^2 p(y=k)} \\ &= \frac{\prod_{i=1}^m p(x_i \mid y = j)^2 }{\sum_{k=1}^n \prod_{i=1}^m p(x_i \mid y = k)^2} \\ &\ge \frac{\prod_{i=1}^m p(x_i \mid y = j)^2 }{\sum_{k=1}^n \prod_{i=1}^m p(x_i \mid y = k)p(x_i \mid y = j)} \\ &\ge \frac{\prod_{i=1}^m p(x_i \mid y = j) }{\sum_{k=1}^n \prod_{i=1}^m p(x_i \mid y = k)} &\end{align}

This shows that the probability for the prediction $y=j$ given by the new model is greater or equal to the probability given by the old model. The first step is using the fact that $p(y)$ is uniform. The second step is using the inequality (1) from above.

This fact was pointed out to me in Section 2.2.3 of An Introduction to Conditional Random Fields.