Squared error

Problem

One of the most commonly used algorithms in machine learning is linear regression. The loss function used in linear regression is the squared error given by \begin{align} \mathcal{L}(X,y) = \frac{1}{2} ||X^T \theta - y||^2 = \frac{1}{2} \sum_{i=1}^n (\theta^T x_i - y_i)^2. \end{align} where is the number of examples, are the columns of the matrix and are the elements of the vector .

Assume the targets are generated via a linear transformation of the inputs and some zero-mean Gaussian noise. In other words \begin{align} y_i = \theta^T x_i + \epsilon_i \end{align} where .

Show that maximizing the likelihood of the parameters is equivalent to minimizing the squared error above.

Solution

show