Chapter 3: Fundamental Principles of Kalman Filtering

Haiyue

October 2, 2025

9min

Chapter 3: Fundamental Principles of Kalman Filtering

Learning Objectives

Understand the core ideas and working principles of Kalman filtering
Master the mathematical derivation of prediction and update steps
Understand optimality proofs and the Bayesian framework

Core Ideas

The Kalman filter is a recursive state estimation algorithm whose core idea is to obtain optimal estimates of system states through a predict-update cycle, utilizing the system’s dynamic model and observation data.

Basic Philosophy

Kalman filtering is based on the following core concepts:

State-space representation: The system can be fully described by state vectors
Linear Gaussian assumption: System dynamics and observations are linear, with Gaussian noise
Bayesian update: Optimal estimation using prior information and observation data
Recursive processing: Each step builds on the previous result, suitable for real-time processing

Mathematical Framework

State-Space Model

Kalman filtering is based on the following state-space model:

State transition equation: $x_k = F_k x_{k-1} + B_k u_k + w_k$

Observation equation: $z_k = H_k x_k + v_k$

Where:

$x_k \in \mathbb{R}^n$ : State vector at time k
$z_k \in \mathbb{R}^m$ : Observation vector at time k
$F_k \in \mathbb{R}^{n \times n}$ : State transition matrix
$H_k \in \mathbb{R}^{m \times n}$ : Observation matrix
$B_k \in \mathbb{R}^{n \times l}$ : Control input matrix
$u_k \in \mathbb{R}^l$ : Control input vector
$w_k \sim \mathcal{N}(0, Q_k)$ : Process noise
$v_k \sim \mathcal{N}(0, R_k)$ : Observation noise

Probabilistic Representation

🔄 正在渲染 Mermaid 图表...

graph LR
    A["Prior Distribution<br/>p(x<sub>k</sub> ∣ z<sub>1:k-1</sub>)"] --> B["Prediction Step"]
    B --> C["Predictive Distribution<br/>p(x<sub>k</sub> ∣ z<sub>1:k-1</sub>)"]
    C --> D["Update Step"]
    E["New Observation<br/>z<sub>k</sub>"] --> D
    D --> F["Posterior Distribution<br/>p(x<sub>k</sub> ∣ z<sub>1:k</sub>)"]
    F --> G["Next Time<br/>Prior Distribution"]

Algorithm Derivation

Prediction Step

Based on the posterior estimate from the previous time step, predict the current state:

State prediction: $\hat{x}_{k|k-1} = F_k \hat{x}_{k-1|k-1} + B_k u_k$

Covariance prediction: $P_{k|k-1} = F_k P_{k-1|k-1} F_k^T + Q_k$

Update Step

Correct the prediction using observation data:

Kalman gain: $K_k = P_{k|k-1} H_k^T (H_k P_{k|k-1} H_k^T + R_k)^{-1}$

State update: $\hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k (z_k - H_k \hat{x}_{k|k-1})$

Covariance update: $P_{k|k} = (I - K_k H_k) P_{k|k-1}$

Innovation Sequence

The innovation is the difference between observation and prediction: $\nu_k = z_k - H_k \hat{x}_{k|k-1}$

Innovation covariance: $S_k = H_k P_{k|k-1} H_k^T + R_k$

Derivation in the Bayesian Framework

Application of Bayes’ Theorem

Kalman filtering is essentially a special case of Bayesian estimation. For linear Gaussian systems:

$p(x_k|z_{1:k}) \propto p(z_k|x_k) p(x_k|z_{1:k-1})$

Where:

$p(z_k|x_k)$ : Likelihood function
$p(x_k|z_{1:k-1})$ : Prior distribution
$p(x_k|z_{1:k})$ : Posterior distribution

Closure Property of Gaussian Distributions

Due to the properties of linear transformations and Gaussian distributions:

Predictive distribution: $p(x_k|z_{1:k-1}) = \mathcal{N}(\hat{x}_{k|k-1}, P_{k|k-1})$
Likelihood function: $p(z_k|x_k) = \mathcal{N}(H_k x_k, R_k)$
Posterior distribution: $p(x_k|z_{1:k}) = \mathcal{N}(\hat{x}_{k|k}, P_{k|k})$

Algorithm Flow

def kalman_filter_step(x_prev, P_prev, F, B, u, Q, H, R, z):
    """
    Single step Kalman filter update

    Parameters:
    x_prev: Previous state estimate
    P_prev: Previous covariance matrix
    F: State transition matrix
    B: Control input matrix
    u: Control input
    Q: Process noise covariance
    H: Observation matrix
    R: Observation noise covariance
    z: Current observation
    """

    # Prediction step
    x_pred = F @ x_prev + B @ u  # State prediction
    P_pred = F @ P_prev @ F.T + Q  # Covariance prediction

    # Update step
    y = z - H @ x_pred  # Innovation (residual)
    S = H @ P_pred @ H.T + R  # Innovation covariance
    K = P_pred @ H.T @ np.linalg.inv(S)  # Kalman gain

    x_updated = x_pred + K @ y  # State update
    P_updated = (np.eye(len(x_pred)) - K @ H) @ P_pred  # Covariance update

    return x_updated, P_updated, K, y, S

Optimality Proof

Minimum Mean Square Error Criterion

Kalman filtering is optimal in the following sense:

$\hat{x}_{k|k} = \arg\min_{\hat{x}} E[(x_k - \hat{x})^T (x_k - \hat{x}) | z_{1:k}]$

Optimality Conditions

Under linear Gaussian assumptions, Kalman filtering provides:

Minimum variance estimator: Minimum variance among all unbiased estimators
Maximum likelihood estimator: Under Gaussian distribution assumption
Minimum mean square error estimator: Minimum mean square error among all estimators

Orthogonal Projection Interpretation

From a geometric perspective, Kalman filtering is an orthogonal projection of the state vector onto the observation space:

# Geometric interpretation of orthogonal projection
def geometric_interpretation():
    """
    Geometric interpretation of Kalman filtering:
    State estimate = Prior estimate + Kalman gain × Innovation

    This is equivalent to orthogonal projection in Hilbert space
    """
    # Innovation is the part in observation space that cannot be explained by the prior
    innovation = z - H @ x_pred

    # Kalman gain determines how to project the innovation back to state space
    kalman_gain = P_pred @ H.T @ inv(H @ P_pred @ H.T + R)

    # Final estimate is the prior plus the projected innovation
    x_posterior = x_pred + kalman_gain @ innovation

    return x_posterior

Information-Theoretic Perspective

Information Fusion

Kalman filtering can be understood as an information fusion process:

Prior information: Prediction from the system model
Observation information: Measurements from sensors
Fusion weights: Determined by their respective uncertainties

Entropy Reduction

Each observation reduces the uncertainty of the system state:

$H(x_k|z_{1:k}) \leq H(x_k|z_{1:k-1})$

where $H(\cdot)$ denotes differential entropy.

Properties of the Covariance Matrix

Symmetry and Positive Definiteness

The covariance matrix $P_k$ has the following important properties:

Symmetry: $P_k = P_k^T$
Positive semi-definiteness: $P_k \succeq 0$
Monotonicity: $P_{k|k} \preceq P_{k|k-1}$ (observation always reduces uncertainty)

Numerical Stability

For numerical stability, the following form is commonly used:

def joseph_form_update(P_pred, H, K, R):
    """
    Joseph form covariance update, ensures numerical stability
    """
    I_KH = np.eye(len(P_pred)) - K @ H
    P_updated = I_KH @ P_pred @ I_KH.T + K @ R @ K.T
    return P_updated

Practical Application Considerations

Initialization

Importance of Initialization

Kalman filter performance largely depends on the choice of initial state estimate $\hat{x}_{0|0}$ and initial covariance $P_{0|0}$ .

Common initialization strategies:

State initialization: Based on prior knowledge or first few observations
Covariance initialization: Reflects initial uncertainty, typically choose large values

Filter Divergence

Filter divergence may occur when the model mismatches or parameters are poorly chosen:

Symptoms: Covariance matrix grows, estimation error increases
Causes: $Q$ matrix too small, $R$ matrix too large, model errors
Solutions: Parameter tuning, robustness improvements

Preview of Financial Applications

Typical applications of Kalman filtering in finance include:

State variables: Implied volatility, risk factors, market trends
Observation variables: Stock prices, interest rates, option prices
Application scenarios: Parameter estimation, risk management, portfolio optimization

Preview of Next Chapter

In the next chapter, we will learn in detail the specific implementation of linear Kalman filtering, including the programming implementation of five core equations and numerical techniques.

Apache Airflow 3.x

Chapter 1: Introduction to Apache Airflow 3.x

Chapter 2: Core Concepts and Architecture

Chapter 3: Writing Your First DAG

Chapter 4: Built-in Operators and Sensors

Apache Airflow 3.x

Chapter 1: Introduction to Apache Airflow 3.x

Chapter 2: Core Concepts and Architecture

Chapter 3: Writing Your First DAG

Chapter 4: Built-in Operators and Sensors

Chapter 3: Fundamental Principles of Kalman Filtering

Chapter 3: Fundamental Principles of Kalman Filtering

Core Ideas

Basic Philosophy

Mathematical Framework

State-Space Model

Probabilistic Representation

Algorithm Derivation

Prediction Step

Update Step

Innovation Sequence

Derivation in the Bayesian Framework

Application of Bayes’ Theorem

Closure Property of Gaussian Distributions

Algorithm Flow

Optimality Proof

Minimum Mean Square Error Criterion

Orthogonal Projection Interpretation

Information-Theoretic Perspective

Information Fusion

Entropy Reduction

Properties of the Covariance Matrix

Symmetry and Positive Definiteness

Numerical Stability

Practical Application Considerations

Initialization

Filter Divergence

Preview of Financial Applications