Chapter 5: Hidden Markov Model (HMM) Fundamentals

Haiyue
14min

Chapter 5: Hidden Markov Model (HMM) Fundamentals

Learning Objectives
  • Understand the structure of Hidden Markov Models
  • Master the relationship between observation processes and hidden state processes
  • Learn the Forward-Backward algorithm
  • Understand the Viterbi algorithm and Baum-Welch algorithm

Knowledge Summary

1. Basic Structure of Hidden Markov Models

A Hidden Markov Model (HMM) is a doubly stochastic process:

  • Hidden State Sequence {Xt}\{X_t\}: Unobservable Markov chain
  • Observation Sequence {Yt}\{Y_t\}: Observable stochastic process
🔄 正在渲染 Mermaid 图表...

2. Three Basic Elements of HMM

Transition Probability Matrix AA: A={aij},aij=P(Xt+1=jXt=i)A = \{a_{ij}\}, \quad a_{ij} = P(X_{t+1} = j | X_t = i)

Emission Probability Matrix BB: B={bj(k)},bj(k)=P(Yt=vkXt=j)B = \{b_j(k)\}, \quad b_j(k) = P(Y_t = v_k | X_t = j)

Initial State Probability π\pi: π={πi},πi=P(X1=i)\pi = \{\pi_i\}, \quad \pi_i = P(X_1 = i)

Three Basic Assumptions of HMM
  1. Markov Assumption: P(XtXt1,Xt2,,X1)=P(XtXt1)P(X_t|X_{t-1}, X_{t-2}, \ldots, X_1) = P(X_t|X_{t-1})
  2. Observation Independence Assumption: P(YtXt,Xt1,,Yt1,)=P(YtXt)P(Y_t|X_t, X_{t-1}, \ldots, Y_{t-1}, \ldots) = P(Y_t|X_t)
  3. Homogeneity Assumption: Transition and emission probabilities don’t change over time

3. Three Basic Problems of HMM

  1. Evaluation Problem: Given model parameters, calculate the probability of an observation sequence
  2. Decoding Problem: Given an observation sequence, find the most likely hidden state sequence
  3. Learning Problem: Given an observation sequence, estimate model parameters

Example Code

Example 1: HMM Model Class Implementation

import numpy as np
import matplotlib.pyplot as plt
from scipy.special import logsumexp

class HiddenMarkovModel:
    """
    Hidden Markov Model Implementation
    """

    def __init__(self, n_states, n_observations):
        """
        Initialize HMM model

        Parameters:
        n_states: Number of hidden states
        n_observations: Number of observation values
        """
        self.n_states = n_states
        self.n_observations = n_observations

        # Initialize parameters
        self.transition_matrix = None  # Transition probability matrix A
        self.emission_matrix = None    # Emission probability matrix B
        self.initial_probs = None      # Initial probability distribution π

    def initialize_parameters(self, random_state=42):
        """
        Randomly initialize model parameters
        """
        np.random.seed(random_state)

        # Transition probability matrix (rows sum to 1)
        self.transition_matrix = np.random.dirichlet(
            np.ones(self.n_states), size=self.n_states
        )

        # Emission probability matrix (rows sum to 1)
        self.emission_matrix = np.random.dirichlet(
            np.ones(self.n_observations), size=self.n_states
        )

        # Initial state probability
        self.initial_probs = np.random.dirichlet(np.ones(self.n_states))

    def set_parameters(self, A, B, pi):
        """
        Set model parameters

        Parameters:
        A: Transition probability matrix
        B: Emission probability matrix
        pi: Initial state probability
        """
        self.transition_matrix = A
        self.emission_matrix = B
        self.initial_probs = pi

    def generate_sequence(self, length):
        """
        Generate observation sequence and corresponding hidden state sequence

        Parameters:
        length: Sequence length

        Returns:
        observations: Observation sequence
        states: Hidden state sequence
        """
        states = np.zeros(length, dtype=int)
        observations = np.zeros(length, dtype=int)

        # Initial state
        states[0] = np.random.choice(self.n_states, p=self.initial_probs)
        observations[0] = np.random.choice(
            self.n_observations,
            p=self.emission_matrix[states[0]]
        )

        # Subsequent states
        for t in range(1, length):
            states[t] = np.random.choice(
                self.n_states,
                p=self.transition_matrix[states[t-1]]
            )
            observations[t] = np.random.choice(
                self.n_observations,
                p=self.emission_matrix[states[t]]
            )

        return observations, states

# Create example HMM model
np.random.seed(42)

# Define model parameters (weather prediction example)
# States: 0=Sunny, 1=Rainy
# Observations: 0=Dry, 1=Damp, 2=Wet

A = np.array([
    [0.7, 0.3],  # Sunny->Sunny 0.7, Sunny->Rainy 0.3
    [0.4, 0.6]   # Rainy->Sunny 0.4, Rainy->Rainy 0.6
])

B = np.array([
    [0.6, 0.3, 0.1],  # Sunny: Dry 0.6, Damp 0.3, Wet 0.1
    [0.1, 0.4, 0.5]   # Rainy: Dry 0.1, Damp 0.4, Wet 0.5
])

pi = np.array([0.6, 0.4])  # Initial probability: Sunny 0.6, Rainy 0.4

# Create HMM instance
hmm = HiddenMarkovModel(n_states=2, n_observations=3)
hmm.set_parameters(A, B, pi)

# Generate example sequence
observations, true_states = hmm.generate_sequence(20)

print("HMM Model Parameters:")
print(f"Transition matrix A:\n{A}")
print(f"Emission matrix B:\n{B}")
print(f"Initial probability π: {pi}")
print(f"\nObservation sequence: {observations}")
print(f"True states: {true_states}")

Example 2: Forward Algorithm Implementation

def forward_algorithm(hmm, observations):
    """
    Forward algorithm: Calculate probability of observation sequence

    Parameters:
    hmm: HMM model
    observations: Observation sequence

    Returns:
    alpha: Forward probability matrix
    log_probability: Log probability of observation sequence
    """
    T = len(observations)
    N = hmm.n_states

    # Initialize forward probability matrix (use log probabilities to avoid underflow)
    log_alpha = np.zeros((T, N))

    # Initialization: t=0
    log_alpha[0] = (np.log(hmm.initial_probs) +
                   np.log(hmm.emission_matrix[:, observations[0]]))

    # Recursion: t=1,2,...,T-1
    for t in range(1, T):
        for j in range(N):
            # log(sum(exp(log_values))) = logsumexp(log_values)
            log_alpha[t, j] = logsumexp(
                log_alpha[t-1] + np.log(hmm.transition_matrix[:, j])
            ) + np.log(hmm.emission_matrix[j, observations[t]])

    # Calculate total probability of observation sequence
    log_probability = logsumexp(log_alpha[T-1])

    return np.exp(log_alpha), log_probability

def backward_algorithm(hmm, observations):
    """
    Backward algorithm

    Parameters:
    hmm: HMM model
    observations: Observation sequence

    Returns:
    beta: Backward probability matrix
    """
    T = len(observations)
    N = hmm.n_states

    # Initialize backward probability matrix
    log_beta = np.zeros((T, N))

    # Initialization: t=T-1
    log_beta[T-1] = 0  # log(1) = 0

    # Recursion: t=T-2,T-3,...,0
    for t in range(T-2, -1, -1):
        for i in range(N):
            log_beta[t, i] = logsumexp(
                np.log(hmm.transition_matrix[i]) +
                np.log(hmm.emission_matrix[:, observations[t+1]]) +
                log_beta[t+1]
            )

    return np.exp(log_beta)

# Calculate forward and backward probabilities
alpha, log_prob = forward_algorithm(hmm, observations)
beta = backward_algorithm(hmm, observations)

print(f"\nLog probability of observation sequence: {log_prob:.4f}")
print(f"Probability of observation sequence: {np.exp(log_prob):.2e}")

# Verify forward-backward algorithm consistency
consistency_check = np.allclose(
    alpha[0] * beta[0],
    alpha[-1].sum()
)
print(f"Forward-backward algorithm consistency check: {consistency_check}")

Example 3: Viterbi Algorithm Implementation

def viterbi_algorithm(hmm, observations):
    """
    Viterbi algorithm: Find the most likely hidden state sequence

    Parameters:
    hmm: HMM model
    observations: Observation sequence

    Returns:
    best_path: Most likely state sequence
    best_prob: Maximum probability
    """
    T = len(observations)
    N = hmm.n_states

    # Initialize
    log_delta = np.zeros((T, N))  # Viterbi probability
    psi = np.zeros((T, N), dtype=int)  # Backtrack path

    # Initialization: t=0
    log_delta[0] = (np.log(hmm.initial_probs) +
                   np.log(hmm.emission_matrix[:, observations[0]]))

    # Recursion: t=1,2,...,T-1
    for t in range(1, T):
        for j in range(N):
            # Find maximum probability path to state j
            transition_scores = (log_delta[t-1] +
                               np.log(hmm.transition_matrix[:, j]))

            # Record maximum probability and corresponding previous state
            psi[t, j] = np.argmax(transition_scores)
            log_delta[t, j] = (np.max(transition_scores) +
                              np.log(hmm.emission_matrix[j, observations[t]]))

    # Termination: Find optimal state at final time
    best_final_state = np.argmax(log_delta[T-1])
    best_prob = np.max(log_delta[T-1])

    # Backtrack: Reconstruct optimal path
    best_path = np.zeros(T, dtype=int)
    best_path[T-1] = best_final_state

    for t in range(T-2, -1, -1):
        best_path[t] = psi[t+1, best_path[t+1]]

    return best_path, best_prob

# Use Viterbi algorithm for decoding
predicted_states, max_log_prob = viterbi_algorithm(hmm, observations)

print(f"\nViterbi decoding results:")
print(f"Predicted state sequence: {predicted_states}")
print(f"True state sequence: {true_states}")
print(f"Maximum path probability: {np.exp(max_log_prob):.2e}")

# Calculate decoding accuracy
accuracy = np.mean(predicted_states == true_states)
print(f"Decoding accuracy: {accuracy:.2%}")

# Visualization results
plt.figure(figsize=(12, 8))

# Subplot 1: Observation sequence
plt.subplot(3, 1, 1)
plt.plot(observations, 'bo-', linewidth=2, markersize=8)
plt.ylabel('Observation')
plt.title('Observation Sequence')
plt.grid(True, alpha=0.3)
plt.ylim(-0.5, 2.5)
plt.yticks([0, 1, 2], ['Dry', 'Damp', 'Wet'])

# Subplot 2: True states
plt.subplot(3, 1, 2)
plt.plot(true_states, 'ro-', linewidth=2, markersize=8, label='True States')
plt.ylabel('State')
plt.title('True Hidden State Sequence')
plt.grid(True, alpha=0.3)
plt.ylim(-0.5, 1.5)
plt.yticks([0, 1], ['Sunny', 'Rainy'])

# Subplot 3: Predicted states
plt.subplot(3, 1, 3)
plt.plot(predicted_states, 'go-', linewidth=2, markersize=8, label='Predicted States')
plt.xlabel('Time')
plt.ylabel('State')
plt.title('Viterbi Predicted State Sequence')
plt.grid(True, alpha=0.3)
plt.ylim(-0.5, 1.5)
plt.yticks([0, 1], ['Sunny', 'Rainy'])

plt.tight_layout()
plt.show()

Theoretical Analysis

Mathematical Representation of HMM

Let the HMM model be λ=(A,B,π)\lambda = (A, B, \pi), observation sequence be O={o1,o2,,oT}O = \{o_1, o_2, \ldots, o_T\}, hidden state sequence be Q={q1,q2,,qT}Q = \{q_1, q_2, \ldots, q_T\}.

Joint Probability: P(O,Qλ)=πq1t=2Taqt1qtt=1Tbqt(ot)P(O, Q|\lambda) = \pi_{q_1} \prod_{t=2}^T a_{q_{t-1}q_t} \prod_{t=1}^T b_{q_t}(o_t)

Mathematical Principles of Forward-Backward Algorithm

Forward Probability αt(i)\alpha_t(i): αt(i)=P(o1,o2,,ot,qt=iλ)\alpha_t(i) = P(o_1, o_2, \ldots, o_t, q_t = i | \lambda)

Recursive Relation: αt+1(j)=[i=1Nαt(i)aij]bj(ot+1)\alpha_{t+1}(j) = \left[\sum_{i=1}^N \alpha_t(i) a_{ij}\right] b_j(o_{t+1})

Backward Probability βt(i)\beta_t(i): βt(i)=P(ot+1,ot+2,,oTqt=i,λ)\beta_t(i) = P(o_{t+1}, o_{t+2}, \ldots, o_T | q_t = i, \lambda)

Dynamic Programming in Viterbi Algorithm

Optimal Path Probability δt(i)\delta_t(i): δt(i)=maxq1,,qt1P(q1,,qt1,qt=i,o1,,otλ)\delta_t(i) = \max_{q_1,\ldots,q_{t-1}} P(q_1, \ldots, q_{t-1}, q_t = i, o_1, \ldots, o_t | \lambda)

Recursive Relation: δt+1(j)=maxi[δt(i)aij]bj(ot+1)\delta_{t+1}(j) = \max_i [\delta_t(i) a_{ij}] b_j(o_{t+1})

EM Framework of Baum-Welch Algorithm

E Step: Calculate expected sufficient statistics

  • γt(i)=P(qt=iO,λ)\gamma_t(i) = P(q_t = i | O, \lambda)
  • ξt(i,j)=P(qt=i,qt+1=jO,λ)\xi_t(i,j) = P(q_t = i, q_{t+1} = j | O, \lambda)

M Step: Update parameters π^i=γ1(i)\hat{\pi}_i = \gamma_1(i) a^ij=t=1T1ξt(i,j)t=1T1γt(i)\hat{a}_{ij} = \frac{\sum_{t=1}^{T-1} \xi_t(i,j)}{\sum_{t=1}^{T-1} \gamma_t(i)} b^j(k)=t=1,ot=vkTγt(j)t=1Tγt(j)\hat{b}_j(k) = \frac{\sum_{t=1, o_t=v_k}^T \gamma_t(j)}{\sum_{t=1}^T \gamma_t(j)}

Mathematical Formula Summary

  1. Forward probability recursion: αt+1(j)=[i=1Nαt(i)aij]bj(ot+1)\alpha_{t+1}(j) = \left[\sum_{i=1}^N \alpha_t(i) a_{ij}\right] b_j(o_{t+1})

  2. Backward probability recursion: βt(i)=j=1Naijbj(ot+1)βt+1(j)\beta_t(i) = \sum_{j=1}^N a_{ij} b_j(o_{t+1}) \beta_{t+1}(j)

  3. Viterbi recursion: δt+1(j)=maxi[δt(i)aij]bj(ot+1)\delta_{t+1}(j) = \max_i [\delta_t(i) a_{ij}] b_j(o_{t+1})

  4. State posterior probability: γt(i)=αt(i)βt(i)j=1Nαt(j)βt(j)\gamma_t(i) = \frac{\alpha_t(i)\beta_t(i)}{\sum_{j=1}^N \alpha_t(j)\beta_t(j)}

  5. Transition posterior probability: ξt(i,j)=αt(i)aijbj(ot+1)βt+1(j)P(Oλ)\xi_t(i,j) = \frac{\alpha_t(i) a_{ij} b_j(o_{t+1}) \beta_{t+1}(j)}{P(O|\lambda)}

Implementation Notes
  • Use log probabilities to avoid numerical underflow
  • Baum-Welch algorithm may converge to local optima
  • Model selection (number of states) significantly affects performance
  • Initial parameter selection affects convergence speed and results