Problem
One problem with the Naive Bayes classifier is that the confidence of the model
in its predictions can change even when no new information has been gained.
Let for and be the probability
distributions defining a Naive Bayes model. There are features and
can take on possible classes. Assume the the prior over labels
is uniform.
Show that if every feature is repeated exactly once, the model becomes more
confident in its predicted output class for an arbitrary example.
Solution
show
Let be the index of the class label that the Naive Bayes model predicts
for a given example. In this case
\begin{align}
\prod_{i=1}^m p(x_i \mid y = j)p(y=j) > \prod_{i=1}^m p(x_i \mid y = k)p(y=k) \quad \text{for} \quad k \ne j.
\end{align}
Since the prior is uniform we have
\begin{align}
\texttt{(1)} \quad \prod_{i=1}^m p(x_i \mid y = j) > \prod_{i=1}^m p(x_i \mid y = k) \quad \text{for} \quad k \ne j.
\end{align}
Now consider probability given to the same label by the model with every
output feature repeated:
\begin{align}
&\frac{\prod_{i=1}^m p(x_i \mid y = j)^2 p(y=j)}{\sum_{k=1}^n \prod_{i=1}^m p(x_i \mid y = k)^2 p(y=k)} \\
&= \frac{\prod_{i=1}^m p(x_i \mid y = j)^2 }{\sum_{k=1}^n \prod_{i=1}^m p(x_i \mid y = k)^2} \\
&\ge \frac{\prod_{i=1}^m p(x_i \mid y = j)^2 }{\sum_{k=1}^n \prod_{i=1}^m p(x_i \mid y = k)p(x_i \mid y = j)} \\
&\ge \frac{\prod_{i=1}^m p(x_i \mid y = j) }{\sum_{k=1}^n \prod_{i=1}^m p(x_i \mid y = k)}
&\end{align}
This shows that the probability for the prediction given by the new
model is greater or equal to the probability given by the old model. The first
step is using the fact that is uniform. The second step is using the
inequality (1)
from above.
This fact was pointed out to me in Section 2.2.3 of An Introduction to
Conditional Random
Fields.