Chapter 3: Fundamental Principles of Kalman Filtering

Haiyue
9min

Chapter 3: Fundamental Principles of Kalman Filtering

Learning Objectives
  • Understand the core ideas and working principles of Kalman filtering
  • Master the mathematical derivation of prediction and update steps
  • Understand optimality proofs and the Bayesian framework

Core Ideas

The Kalman filter is a recursive state estimation algorithm whose core idea is to obtain optimal estimates of system states through a predict-update cycle, utilizing the system’s dynamic model and observation data.

Basic Philosophy

Kalman filtering is based on the following core concepts:

  1. State-space representation: The system can be fully described by state vectors
  2. Linear Gaussian assumption: System dynamics and observations are linear, with Gaussian noise
  3. Bayesian update: Optimal estimation using prior information and observation data
  4. Recursive processing: Each step builds on the previous result, suitable for real-time processing

Mathematical Framework

State-Space Model

Kalman filtering is based on the following state-space model:

State transition equation: xk=Fkxk1+Bkuk+wkx_k = F_k x_{k-1} + B_k u_k + w_k

Observation equation: zk=Hkxk+vkz_k = H_k x_k + v_k

Where:

  • xkRnx_k \in \mathbb{R}^n: State vector at time k
  • zkRmz_k \in \mathbb{R}^m: Observation vector at time k
  • FkRn×nF_k \in \mathbb{R}^{n \times n}: State transition matrix
  • HkRm×nH_k \in \mathbb{R}^{m \times n}: Observation matrix
  • BkRn×lB_k \in \mathbb{R}^{n \times l}: Control input matrix
  • ukRlu_k \in \mathbb{R}^l: Control input vector
  • wkN(0,Qk)w_k \sim \mathcal{N}(0, Q_k): Process noise
  • vkN(0,Rk)v_k \sim \mathcal{N}(0, R_k): Observation noise

Probabilistic Representation

🔄 正在渲染 Mermaid 图表...

Algorithm Derivation

Prediction Step

Based on the posterior estimate from the previous time step, predict the current state:

State prediction: x^kk1=Fkx^k1k1+Bkuk\hat{x}_{k|k-1} = F_k \hat{x}_{k-1|k-1} + B_k u_k

Covariance prediction: Pkk1=FkPk1k1FkT+QkP_{k|k-1} = F_k P_{k-1|k-1} F_k^T + Q_k

Update Step

Correct the prediction using observation data:

Kalman gain: Kk=Pkk1HkT(HkPkk1HkT+Rk)1K_k = P_{k|k-1} H_k^T (H_k P_{k|k-1} H_k^T + R_k)^{-1}

State update: x^kk=x^kk1+Kk(zkHkx^kk1)\hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k (z_k - H_k \hat{x}_{k|k-1})

Covariance update: Pkk=(IKkHk)Pkk1P_{k|k} = (I - K_k H_k) P_{k|k-1}

Innovation Sequence

The innovation is the difference between observation and prediction: νk=zkHkx^kk1\nu_k = z_k - H_k \hat{x}_{k|k-1}

Innovation covariance: Sk=HkPkk1HkT+RkS_k = H_k P_{k|k-1} H_k^T + R_k

Derivation in the Bayesian Framework

Application of Bayes’ Theorem

Kalman filtering is essentially a special case of Bayesian estimation. For linear Gaussian systems:

p(xkz1:k)p(zkxk)p(xkz1:k1)p(x_k|z_{1:k}) \propto p(z_k|x_k) p(x_k|z_{1:k-1})

Where:

  • p(zkxk)p(z_k|x_k): Likelihood function
  • p(xkz1:k1)p(x_k|z_{1:k-1}): Prior distribution
  • p(xkz1:k)p(x_k|z_{1:k}): Posterior distribution

Closure Property of Gaussian Distributions

Due to the properties of linear transformations and Gaussian distributions:

  1. Predictive distribution: p(xkz1:k1)=N(x^kk1,Pkk1)p(x_k|z_{1:k-1}) = \mathcal{N}(\hat{x}_{k|k-1}, P_{k|k-1})
  2. Likelihood function: p(zkxk)=N(Hkxk,Rk)p(z_k|x_k) = \mathcal{N}(H_k x_k, R_k)
  3. Posterior distribution: p(xkz1:k)=N(x^kk,Pkk)p(x_k|z_{1:k}) = \mathcal{N}(\hat{x}_{k|k}, P_{k|k})

Algorithm Flow

def kalman_filter_step(x_prev, P_prev, F, B, u, Q, H, R, z):
    """
    Single step Kalman filter update

    Parameters:
    x_prev: Previous state estimate
    P_prev: Previous covariance matrix
    F: State transition matrix
    B: Control input matrix
    u: Control input
    Q: Process noise covariance
    H: Observation matrix
    R: Observation noise covariance
    z: Current observation
    """

    # Prediction step
    x_pred = F @ x_prev + B @ u  # State prediction
    P_pred = F @ P_prev @ F.T + Q  # Covariance prediction

    # Update step
    y = z - H @ x_pred  # Innovation (residual)
    S = H @ P_pred @ H.T + R  # Innovation covariance
    K = P_pred @ H.T @ np.linalg.inv(S)  # Kalman gain

    x_updated = x_pred + K @ y  # State update
    P_updated = (np.eye(len(x_pred)) - K @ H) @ P_pred  # Covariance update

    return x_updated, P_updated, K, y, S

Optimality Proof

Minimum Mean Square Error Criterion

Kalman filtering is optimal in the following sense:

x^kk=argminx^E[(xkx^)T(xkx^)z1:k]\hat{x}_{k|k} = \arg\min_{\hat{x}} E[(x_k - \hat{x})^T (x_k - \hat{x}) | z_{1:k}]

Optimality Conditions

Under linear Gaussian assumptions, Kalman filtering provides:

  1. Minimum variance estimator: Minimum variance among all unbiased estimators
  2. Maximum likelihood estimator: Under Gaussian distribution assumption
  3. Minimum mean square error estimator: Minimum mean square error among all estimators

Orthogonal Projection Interpretation

From a geometric perspective, Kalman filtering is an orthogonal projection of the state vector onto the observation space:

# Geometric interpretation of orthogonal projection
def geometric_interpretation():
    """
    Geometric interpretation of Kalman filtering:
    State estimate = Prior estimate + Kalman gain × Innovation

    This is equivalent to orthogonal projection in Hilbert space
    """
    # Innovation is the part in observation space that cannot be explained by the prior
    innovation = z - H @ x_pred

    # Kalman gain determines how to project the innovation back to state space
    kalman_gain = P_pred @ H.T @ inv(H @ P_pred @ H.T + R)

    # Final estimate is the prior plus the projected innovation
    x_posterior = x_pred + kalman_gain @ innovation

    return x_posterior

Information-Theoretic Perspective

Information Fusion

Kalman filtering can be understood as an information fusion process:

  1. Prior information: Prediction from the system model
  2. Observation information: Measurements from sensors
  3. Fusion weights: Determined by their respective uncertainties

Entropy Reduction

Each observation reduces the uncertainty of the system state:

H(xkz1:k)H(xkz1:k1)H(x_k|z_{1:k}) \leq H(x_k|z_{1:k-1})

where H()H(\cdot) denotes differential entropy.

Properties of the Covariance Matrix

Symmetry and Positive Definiteness

The covariance matrix PkP_k has the following important properties:

  1. Symmetry: Pk=PkTP_k = P_k^T
  2. Positive semi-definiteness: Pk0P_k \succeq 0
  3. Monotonicity: PkkPkk1P_{k|k} \preceq P_{k|k-1} (observation always reduces uncertainty)

Numerical Stability

For numerical stability, the following form is commonly used:

def joseph_form_update(P_pred, H, K, R):
    """
    Joseph form covariance update, ensures numerical stability
    """
    I_KH = np.eye(len(P_pred)) - K @ H
    P_updated = I_KH @ P_pred @ I_KH.T + K @ R @ K.T
    return P_updated

Practical Application Considerations

Initialization

Importance of Initialization

Kalman filter performance largely depends on the choice of initial state estimate x^00\hat{x}_{0|0} and initial covariance P00P_{0|0}.

Common initialization strategies:

  1. State initialization: Based on prior knowledge or first few observations
  2. Covariance initialization: Reflects initial uncertainty, typically choose large values

Filter Divergence

Filter divergence may occur when the model mismatches or parameters are poorly chosen:

  • Symptoms: Covariance matrix grows, estimation error increases
  • Causes: QQ matrix too small, RR matrix too large, model errors
  • Solutions: Parameter tuning, robustness improvements

Preview of Financial Applications

Typical applications of Kalman filtering in finance include:

  1. State variables: Implied volatility, risk factors, market trends
  2. Observation variables: Stock prices, interest rates, option prices
  3. Application scenarios: Parameter estimation, risk management, portfolio optimization
Preview of Next Chapter

In the next chapter, we will learn in detail the specific implementation of linear Kalman filtering, including the programming implementation of five core equations and numerical techniques.