CRFs Generalize HMMs

Problem

The Hidden Markov Model (HMM) is a generative model. An HMM models the joint probability of two sequences as \begin{align} p(X, Y) = \prod_{t=1}^T p(x_t \mid y_t) p(y_t \mid y_{t-1}). \end{align}

The Conditional Random Field (CRF) is a discriminative model. A linear-chain CRF models the conditional probability of given as \begin{align} p(Y \mid X) = \frac{1}{Z} \prod_{t=1}^T \exp \left\lbrace \sum_{m=1}^M \mu_m f_m(x_t, y_t, y_{t-1}) \right\rbrace \end{align}

A nice property of the linear-chain CRF is that it is a strict generalization of the HMM. Any distribution representable by an HMM can be modeled by a CRF. Show that this is true.

Solution

show

This fact was pointed out to me in Section 2.3 of An Introduction to Conditional Random Fields.