Problem
The loss function for a single example label pair in logistic
regression can be written as
\begin{align}
\mathcal{L}(\theta) = y\log \sigma(\theta^Tx) + (1 - y) \log (1-\sigma(\theta^Tx))
\end{align}
where and and is the
sigmoid function.
Show that the gradient with respect to the parameters is given by
\begin{align}
\nabla_\theta \mathcal{L}(\theta) = (y - \sigma(x)) x.
\end{align}
Solution
show
First let and take the derivative
. This gives
\begin{align}
\frac{d}{dz} \mathcal{L}(\theta) &= \frac{y}{\sigma(z)}\frac{d}{dz}\sigma(z) - \frac{(1 - y)}{(1-\sigma(z))} \frac{d}{dz}\sigma(z) \\
&= \frac{y}{\sigma(z)}\sigma(z)(1-\sigma(z)) - \frac{(1-y)}{(1-\sigma(z))} \sigma(z) (1-\sigma(z)) \\
&= y(1-\sigma(z)) - (1-y)\sigma(z) \\
&= y - \sigma(z) \\
\end{align}
We used the identity from an
earlier problem. Now to compute the derivative w.r.t we just apply
the chain rule
\begin{align}
\nabla_\theta \mathcal{L}(\theta) &= \frac{d}{dz} \mathcal{L}(\theta) \nabla_\theta (\theta^Tx) \\
&= (y - \sigma(z)) x
\end{align}