CRFs Generalize HMMs

Sep 13, 2017 • sequence models • 1

Problem

The Hidden Markov Model (HMM) is a generative model. An HMM models the joint probability of two sequences as \begin{align} p(X, Y) = \prod_{t=1}^T p(x_t \mid y_t) p(y_t \mid y_{t-1}). \end{align}

The Conditional Random Field (CRF) is a discriminative model. A linear-chain CRF models the conditional probability of $Y$ given $X$ as \begin{align} p(Y \mid X) = \frac{1}{Z} \prod_{t=1}^T \exp \left\lbrace \sum_{m=1}^M \mu_m f_m(x_t, y_t, y_{t-1}) \right\rbrace \end{align}

A nice property of the linear-chain CRF is that it is a strict generalization of the HMM. Any distribution representable by an HMM can be modeled by a CRF. Show that this is true.

Solution

show

We can convert an HMM into a CRF as follows. Assume the variable $x_t$ can take on $p$ values and $y_t$ can take on $q$ values. Thus we need $M = pq + q^2$ factors. The factors will be \begin{align} g_{i, j}(x_t, y_t, y_{t-1}) &= \mathbb{1}\lbrace x_t = i, y_t = j \rbrace \\ h_{j, k}(x_t, y_t, y_{t-1}) &= \mathbb{1}\lbrace y_t = j, y_{t-1} = k \rbrace. \end{align}

The weights will be

\begin{align} \lambda_{i, j} &= \log p(x_t = i \mid y_t = j) \\ \theta_{j, k} &= \log p(y_t = j \mid y_{t-1} = k) \end{align} and let $Z = 1$ .

Now the HMM can be written as \begin{align} &\prod_{t=1}^T p(x_t = i \mid y_t = j) p(y_t = j \mid y_{t-1} = k) \\ = &\prod_{t=1}^T \exp \left\lbrace \lambda_{i, j} + \theta_{j, k} \right\rbrace \\ = &\prod_{t=1}^T \exp \left\lbrace \sum_{i,j} \lambda_{i, j} \mathbb{1} \lbrace x_t = i, y_t = j \rbrace + \sum_{j, k} \theta_{j, k} \mathbb{1} \lbrace y_t = j, y_{t-1} = k \rbrace \right\rbrace\\ = &\frac{1}{Z} \prod_{t=1}^T \exp \left\lbrace \sum_{i,j} \lambda_{i,j} g_{i,j}(x_t, y_t, y_{t-1}) + \sum_{j,k} \theta_{j,k} h_{j,k}(x_t, y_t, y_{t-1}) \right\rbrace \\ = &\frac{1}{Z} \prod_{t=1}^T \exp \left\lbrace \sum_{m = 1}^M \mu_{m} f_{m}(x_t, y_t, y_{t-1}) \right\rbrace. \end{align}

In the final step we combine all the factors into a single summation to show the equivalence to a CRF. In this case the factors are \begin{align} f_{i + p (j - 1)}(\cdot) &= g_{i,j}(\cdot) \\ f_{pq + j + q (k - 1)}(\cdot) &= h_{j, k}(\cdot) \end{align} and the weights are \begin{align} \mu_{i + p (j - 1)} &= \lambda_{i,j} \\ \mu_{pq + j + q (k - 1)} &= \theta_{j, k}. \end{align}

This fact was pointed out to me in Section 2.3 of An Introduction to Conditional Random Fields.