Problem
The Hidden Markov Model (HMM) is a generative model. An HMM models the joint
probability of two sequences as
\begin{align}
p(X, Y) = \prod_{t=1}^T p(x_t \mid y_t) p(y_t \mid y_{t-1}).
\end{align}
The Conditional Random Field (CRF) is a discriminative model. A linear-chain
CRF models the conditional
probability of given as
\begin{align}
p(Y \mid X) = \frac{1}{Z} \prod_{t=1}^T \exp
\left\lbrace \sum_{m=1}^M \mu_m f_m(x_t, y_t, y_{t-1}) \right\rbrace
\end{align}
A nice property of the linear-chain CRF is that it is a strict generalization
of the HMM. Any distribution representable by an HMM can be modeled by a CRF.
Show that this is true.
Solution
show
We can convert an HMM into a CRF as follows. Assume the variable can
take on values and can take on values. Thus we need
factors. The factors will be
\begin{align}
g_{i, j}(x_t, y_t, y_{t-1}) &= \mathbb{1}\lbrace x_t = i, y_t = j \rbrace \\
h_{j, k}(x_t, y_t, y_{t-1}) &= \mathbb{1}\lbrace y_t = j, y_{t-1} = k \rbrace.
\end{align}
The weights will be
\begin{align}
\lambda_{i, j} &= \log p(x_t = i \mid y_t = j) \\
\theta_{j, k} &= \log p(y_t = j \mid y_{t-1} = k)
\end{align}
and let .
Now the HMM can be written as
\begin{align}
&\prod_{t=1}^T p(x_t = i \mid y_t = j) p(y_t = j \mid y_{t-1} = k) \\
= &\prod_{t=1}^T \exp \left\lbrace \lambda_{i, j} + \theta_{j, k} \right\rbrace \\
= &\prod_{t=1}^T \exp \left\lbrace \sum_{i,j} \lambda_{i, j} \mathbb{1} \lbrace x_t = i, y_t = j \rbrace + \sum_{j, k} \theta_{j, k} \mathbb{1} \lbrace y_t = j, y_{t-1} = k \rbrace \right\rbrace\\
= &\frac{1}{Z} \prod_{t=1}^T \exp \left\lbrace \sum_{i,j} \lambda_{i,j} g_{i,j}(x_t, y_t, y_{t-1}) + \sum_{j,k} \theta_{j,k} h_{j,k}(x_t, y_t, y_{t-1}) \right\rbrace \\
= &\frac{1}{Z} \prod_{t=1}^T \exp \left\lbrace \sum_{m = 1}^M \mu_{m} f_{m}(x_t, y_t, y_{t-1}) \right\rbrace.
\end{align}
In the final step we combine all the factors into a single summation to show
the equivalence to a CRF. In this case the factors are
\begin{align}
f_{i + p (j - 1)}(\cdot) &= g_{i,j}(\cdot) \\
f_{pq + j + q (k - 1)}(\cdot) &= h_{j, k}(\cdot)
\end{align}
and the weights are
\begin{align}
\mu_{i + p (j - 1)} &= \lambda_{i,j} \\
\mu_{pq + j + q (k - 1)} &= \theta_{j, k}.
\end{align}
This fact was pointed out to me in Section 2.3 of An Introduction to
Conditional Random
Fields.