Problem
We have a training set of examples and test set of
examples both sampled randomly from the same
distribution. Let be the parameters which minimize the sum of squared
errors on the training set.
Prove that
\begin{align}
\mathbb{E} \left[ \frac{1}{N} \sum_{i=1}^N (\beta^T x_i - y_i)^2 \right] \le
\mathbb{E} \left[ \frac{1}{M} \sum_{i=1}^M (\beta^T \tilde{x}_i - \tilde{y}_i)^2 \right]
\end{align}
where the expectations are taken over the training and test samples.
Solution
show
The expectation for a given test example is constant thus the expectation over the test set does not
depend on . Without loss of generality we can set .
Now let be the parameters which minimize the sum of squared errors on the test set. Since the test set comes from the same distribution as the training set we have
\begin{align}
\mathbb{E} \left[ \frac{1}{N} \sum_{i=1}^N (\beta^T x_i - y_i)^2 \right] =
\mathbb{E} \left[ \frac{1}{M} \sum_{i=1}^M (\tilde{\beta}^T \tilde{x}_i - \tilde{y}_i)^2 \right].
\end{align}
Since is the minimizer of the sum of squared errors on the test set, we must have
\begin{align}
\mathbb{E} \left[ \frac{1}{N} \sum_{i=1}^N (\tilde{\beta}^T \tilde{x}_i - \tilde{y}_i)^2 \right] \le
\mathbb{E} \left[ \frac{1}{M} \sum_{i=1}^M (\beta^T \tilde{x}_i - \tilde{y}_i)^2 \right]
\end{align}
which proves the claim.
This is exercise 2.9 in Elements of Statistical Learning.