Kalman Filter

Author

Affiliation

Mingze Gao, PhD

Macquarie University

Published

November 28, 2024

This is a note as I study and understand Kalman filter. It does NOT derive Kalman filter but explains it to the best of my ability. A handwritten Python code to estimate Kalman filter is provided because I’d like to practice its implementation, but I recommend using established program/software for better numerical stability and performance.

Intuition

Imagine we are estimating the true inflation rate, $x_{t}$ , a state of the economy that is unobservable. We rely on two pieces of information:

A prediction using past information ${\hat{x}}_{t | t - 1}$ : Based on a model that projects inflation trends.
A new observation $z_{t}$ : A noisy indicator like a market survey or a bond yield linked to inflation.

But both have limitations. The model’s predication can be uncertain, and the observation includes noise. A naturally simple yet elegant solution would be to combine the two by weighing them based on their accuracy, therefore updating the estimate of the true inflation rate ${\hat{x}}_{t | t}$ :

$\begin{matrix} (1) & {\hat{x}}_{t | t} = (1 - w_{t}) {\hat{x}}_{t | t - 1} + w_{t} z_{t}, \end{matrix}$

where $w_{t} \in [0, 1]$ represents the weight given to new measurement $z_{t}$ .

It’s almost surely that we will place more weights on whichever component that is more precise!

$w_{t} \to 1$ if our measurement $z_{t}$ is known to be very good, but the model’s prediction is not.
$w_{t} \to 0$ if our measurement $z_{t}$ is very noisy, but the model’s prediction is quite accurate.

In this way, $w_{t}$ dynamically adjusts how much weight to give each source based on their relative uncertainties. As a result, the updated estimate ${\hat{x}}_{t | t}$ optimally combines the predicted state and the noisy observation, improving our estimate of the true inflation rate over time.

Also, we can rearrange Equation 1 to get $\begin{matrix} (2) & {\hat{x}}_{t | t} = {\hat{x}}_{t | t - 1} + w_{t} (z_{t} - {\hat{x}}_{t | t - 1}) . \end{matrix}$

This means that the updated estimate is our best model estimate based on past information plus an adjustment for the difference between the observation and the model estimate.

Caution

Note that, however, the measurement $z_{t}$ does NOT necessarily have the same scale as the state $x_{t}$ .

In our example, the true inflation rate may be an annualized percentage, but our measurement could be, for instance, the proportion of survey responses that believe inflation is higher than target, or even the trading volume of Bitcoin! Therefore they are not directly comparable such that $(z_{t} - {\hat{x}}_{t | t - 1})$ is not necessarily correct. In practice, we need to account for this difference and will discuss it below.

A more elaborated presentation

Now let’s move on to a more elaborated presentation and proceed to the basic Kalman filter. For pedagogical simplicity, we start with a univariate (or 1-dimensional) case, i.e., there is only one unobservable state variable $x_{t}$ and we observe one indicator $z_{t}$ .

The setup

First, there are two equations, namely the state transition equation Equation 3 and the measurement equation Equation 4.

State transition equation:

This equation reflects our model, or how we perceive the unobservable state to evolve over time. However, there is some uncertainty $ω_{t}$ . $\begin{matrix} (3) & x_{t} = F_{t} x_{t - 1} + ω_{t} \end{matrix}$
- $x_{t}$ : The unobserved true state (e.g., the true inflation rate).
- $F_{t}$ : The factor linking the previous state $x_{t - 1}$ to the current state.
- $ω_{t}$ : Process noise (uncertainty in the model), $ω_{t} \sim N (0, Q_{t})$ .
Measurement equation:

This equation describes how our observed measurement is related to unobservable true state. Because it is a measurement, there is also uncertainty or noise $u_{t}$ . $\begin{matrix} (4) & z_{t} = H_{t} x_{t} + u_{t} \end{matrix}$
- $z_{t}$ : The observed measurement (e.g., a survey or bond yield).
- $H_{t}$ : The factor linking the true state $x_{t}$ to the observation.
- $u_{t}$ : Measurement noise (error in the observed data), $u_{t} \sim N (0, R_{t})$ .

We assume the process noise $ω_{t}$ and measurement noise $u_{t}$ are independent.

Predicting the state

At time $t$ , we can use the model Equation 3 to make a prediction of the true state based on the previous best estimate ${\hat{x}}_{t - 1 | t - 1}$ which yields ${\hat{x}}_{t | t - 1}$ :¹ $\begin{matrix} (5) & {\hat{x}}_{t | t - 1} = F_{t} {\hat{x}}_{t - 1 | t - 1} . \end{matrix}$ - ${\hat{x}}_{t - 1 | t - 1}$ : The previous best (updated) estimate of the state.
- ${\hat{x}}_{t | t - 1}$ : The model’s prediction for the current state.

¹ Note that the uncertainty $w_{t}$ has a zero mean.

Note

When making model prediction at $t$ , we use the updated estimate at time $t - 1$ , i.e., ${\hat{x}}_{t - 1 | t - 1}$ .

This should not be confused with the model estimate before updating at $t - 1$ , i.e., ${\hat{x}}_{t - 1 | t - 2}$ , which is the model prediction at $t - 1$ using information from $t - 2$ .

While the state evolves predictably through $F_{t}$ , the process uncertainty $ω_{t}$ adds noise. Given that it has a zero mean, it does not shift the model prediction, but increases the uncertainty.

Let $P_{t | t}$ denote the variance of the prediction at $t$ , the uncertainty (variance) in this prediction is:² $\begin{matrix} (6) & Variance of predicted state = P_{t | t - 1} = F_{t}^{2} P_{t - 1 | t - 1} + Q_{t} . \end{matrix}$ - $P_{t - 1 | t - 1}$ : Variance of the previous estimate.
- $Q_{t}$ : Variance of the process noise $ω_{t}$ .

² Note that $Var (a X) = a^{2} Var (X)$ .

State update

Also at time $t$ , we can observe $z_{t}$ which is a noisy measurement of the unobservable true state $x_{t}$ . Our goal is to combine the predicted state ${\hat{x}}_{t | t - 1}$ and the new observation $z_{t}$ to get the best estimate of the true state ${\hat{x}}_{t | t}$ .

When the measurement $z_{t}$ is received, it provides new information about the state $x_{t}$ . The difference between the observed measurement and the predicted measurement is called the innovation: $\begin{matrix} (7) & Innovation = z_{t} - H_{t} {\hat{x}}_{t | t - 1} . \end{matrix}$ where $H_{t} {\hat{x}}_{t | t - 1}$ is the predicted measurement based on the predicted state.³

³ As discussed earlier, because the measurement does not necessarily have the same scale as the state, we cannot directly use $(z_{t} - {\hat{x}}_{t | t - 1})$ as the innovation. Note that in Equation 4, $H_{t}$ is used to map the state to the measurement space.

This term quantifies how much the measurement differs from what the model expects. However, it is noisy, and its uncertainty is: $\begin{matrix} (8) & Variance of innovation = R_{t}, \end{matrix}$ where $R_{t}$ is the variance of the measurement noise $u_{t}$ .

From the intuition section, we know that the updated state estimate ${\hat{x}}_{t | t}$ should take the form of $\begin{matrix} (9) & {\hat{x}}_{t | t} = {\hat{x}}_{t | t - 1} + k_{t} (z_{t} - H_{t} {\hat{x}}_{t | t - 1}), \end{matrix}$ where $k_{t}$ is a factor deciding how much weight to give the innovation based on its reliability.

Kalman Gain

This $k_{t}$ is called Kalman Gain and determines how much to trust the new measurement versus the prediction.

It is very much similar to the weight $w_{t}$ we had earlier in Equation 1 and Equation 2. This change of notation is to highlight that it takes into account that the estimated state does not necessarily have the same scale as the innovation (difference in actual and predicted measurements).

For example, the estimated inflation is an annualized percentage, but our measurement could be the frequency of newspapers mentioning the word “inflation” or even the trading volume of Bitcoin.

Kalman gain

The Kalman gain $k_{t}$ determines how much weight to give the measurement relative to the prediction. It is derived to minimize the variance of the updated state estimate ${\hat{x}}_{t | t}$ , and it is computed as: $\begin{matrix} (10) & k_{t} = \frac{P_{t | t - 1} H_{t}}{H_{t}^{2} P_{t | t - 1} + R_{t}} . \end{matrix}$

$P_{t | t - 1}$ is the variance of the predicted state as in Equation 6.
$H_{t}$ is the scaling factor that maps the state to the measurement space, which appears in Equation 4.
$R_{t}$ is the variance of innovation as in Equation 8.

So, the numerator $P_{t | t - 1} H_{t}$ maps the predicted state uncertainty into the measurement space. It reflects how much of the measurement uncertainty comes from the state prediction. The denominator $H_{t}^{2} P_{t | t - 1} + R_{t}$ is the total uncertainty in the measurement, combining both predicted uncertainty and measurement noise.

Therefore,

If the measurement noise $R_{t}$ is small, the Kalman gain $k_{t}$ is larger, so we trust the measurement more.
If the predicted uncertainty $P_{t | t - 1}$ is large, $k_{t}$ is also larger, prioritizing the new measurement.
Conversely, if $R_{t}$ is large (noisy measurement), $k_{t}$ is small, favoring the prediction.

Variance update

Further, new measurement not only helps us to update and refine our estimate of the state, it can also reduce the uncertainty of our estimation. Once Kalman gain $k_{t}$ is computed, the variance of the updated state estimate is $\begin{matrix} (11) & Variance of updated estimate = P_{t | t} = (1 - k_{t} H_{t}) P_{t | t - 1} . \end{matrix}$ This formula reflects the reduction in uncertainty after incorporating the new measurement.

Why

k_{t}

and

k_{t} H_{t}

show up in different places? A ChatGPT answer

In state update Equation 9, $k_{t}$ is used for adjustment. In variance update Equation 11, $k_{t} H_{t}$ is used. The difference arises because state updates and uncertainty updates deal with different quantities:

State Update: The correction is applied directly to the predicted state. $k_{t}$ alone is sufficient because it determines how much of the innovation (in measurement space) should influence the state estimate.
Variance Update: The correction adjusts the uncertainty in the predicted state. $k_{t} H_{t}$ is needed because uncertainty is transformed differently — it depends on both the Kalman gain and how strongly the state affects the measurement (via $H_{t}$ ).

Kalman filter

Now I’m changing to matrix notations to allow for multiple states and measurements.

To begin with, a system’s internal dynamics and observed outputs can be treated separately in systems theory and Kalman filter.

State space

The state space represents the system’s internal, unobserved dynamics — the variables we want to estimate but cannot directly measure. These are often called hidden states or latent states.⁴

⁴ Suppose we are tracking the true inflation rate. This is the “state” of the system. It evolves over time due to economic forces, but we cannot observe it directly.

The state equation describes how the state evolves over time, often including random noise to reflect uncertainty: $X_{t} = F_{t} X_{t - 1} + ω_{t}$

$X_{t}$ : the vector of true states at time $t$ .
$F_{t}$ : the transition factor describing how the states evolve.
$ω_{t}$ : process noise with a zero mean and covariance matrix $Q_{t}$ .

The state space contains all the information needed to describe the system’s internal behavior.

Measurement space

The measurement space is the observable part of the system — the variables we can measure directly, but which may be noisy or indirectly related to the state.⁵

⁵ Suppose we measure inflation using a price index. This is the “measurement” and is influenced by the true inflation rate, but it also includes random noise.

The measurement equation describes how observed measurements are related to unobservable states: $Z_{t} = H_{t} X_{t} + u_{t}$

$Z_{t}$ : the observed measurement at time $t$ .
$H_{t}$ : the relationship between the state and the measurement (e.g., scaling factor).
$u_{t}$ : measurement noise with a mean of zero and covariance matrix $R_{t}$ .

The measurement space is what we observe directly but may not fully capture the true state due to noise or incomplete representation.

Kalman filter in a nutshell

The Kalman filter works by reconciling the state space (hidden, unobservable variables) with the measurement space (noisy observations) to improve our estimate of the state. It does so through two main steps:

Prediction: Uses the state equation to estimate the next state and its uncertainty, based on prior information.
Update: Refines this estimate by incorporating new observations from the measurement equation, weighing them based on their accuracy.

Prediction step

In the prediction step, the Kalman filter estimates the state $X_{t}$ at time $t$ based on the previous state $X_{t - 1}$ :

${\hat{X}}_{t | t - 1} = F_{t} {\hat{X}}_{t - 1 | t - 1}$

${\hat{X}}_{t | t - 1}$ : the predicted state based on the system’s dynamics, or a prior state estimate before incorporating new measurement.

The associated uncertainty (variance) in the prediction is updated as: $P_{t | t - 1} = F_{t} P_{t - 1 | t - 1} F_{t}^{⊤} + Q_{t}$

$P_{t | t - 1}$ : the predicted covariance matrix of the state.

Update step

The update step adjusts the predicted state ${\hat{X}}_{t | t - 1}$ using new observations $Z_{t}$ to compute the updated state ${\hat{X}}_{t | t}$ , or a posterior state estimate: ${\hat{X}}_{t | t} = {\hat{X}}_{t | t - 1} + K_{t} (Z_{t} - H_{t} {\hat{X}}_{t | t - 1})$

$Z_{t} - H_{t} {\hat{X}}_{t | t - 1}$ : the innovation or measurement residual, representing the difference between the actual and predicted measurement.
$K_{t}$ : the Kalman gain, which determines how much weight is given to the innovation.

The Kalman gain is computed as: $K_{t} = P_{t | t - 1} H_{t}^{⊤} (H_{t} P_{t | t - 1} H_{t}^{⊤} + R_{t})^{- 1}$

Updated covariance

Once the state is updated, the covariance of the state estimate $P_{t | t}$ is also revised: $P_{t | t} = (I - K_{t} H_{t}) P_{t | t - 1}$

Here, $I - K_{t} H_{t}$ scales the uncertainty reduction, where $K_{t} H_{t}$ : accounts for how much the measurement improves the state estimate.

This ensures that the updated covariance reflects the combined effect of the prediction and measurement steps.

Recursive nature of the Kalman filter

The Kalman filter is recursive, meaning it processes data one step at a time, making it computationally efficient. Each new measurement refines the state estimate and covariance, which are then used for the next prediction.

Estimation of the Kalman filter

General steps

After we have parameterized the system, using the Kalman filter to estimate states means that we are to estimate:

$F_{t}$ : State transition matrix
$H_{t}$ : Measurement matrix
$Q_{t}$ : Covariance of process noise
$R_{t}$ : Covariance of measurement noise

Additionally, we need to set an initial state estimate:

Initial state estimate ${\hat{X}}_{0 | 0}$ : the starting point of the state variable chosen based on prior knowledge or historical averages.
Initial covariance matrix $P_{0 | 0}$ .

In some cases, $F_{t}, H_{t}, Q_{t}, R_{t}$ can be directly specified based on domain knowledge or theoretical models. More often, we use Maximum Likelihood Estimation (MLE) to estimate noise covariances $Q_{t}$ and $R_{t}$ , which involves:

Forward filtering:
- Apply the Kalman filter to compute the likelihood of observed data at each time step.
- The likelihood depends on the measurement residual $z_{t} - H_{t} {\hat{X}}_{t | t - 1}$ and its variance $S_{t} = H_{t} P_{t | t - 1} H_{t}^{⊤} + R_{t}$ .
The log-likelihood is given by: $\log L = - \frac{1}{2} \sum_{t = 1}^{T} [N \log (2 π) + \log | S_{t} | + (z_{t} - H_{t} {\hat{X}}_{t | t - 1})^{⊤} S_{t}^{- 1} (z_{t} - H_{t} {\hat{X}}_{t | t - 1})]$ where $N$ is the dimension of observation.⁶

⁶ For example, a system’s unobservable true states may be 10-dimensional or it has 10 state variables at any given time. Our measurement or observation may be 5-dimensional, i.e., we observe 5 different metrics of the system. In this case, $N = 5$ .

Parameter optimization:
- Adjust $Q_{t}$ and $R_{t}$ to maximize $\log L$ .
- Numerical optimization methods like gradient ascent or expectation-maximization (EM) are commonly used.

Example and code

Suppose we model a 1-D true state $X_{t}$ using a random walk: $X_{t} = X_{t - 1} + ω_{t}, ω_{t} \sim N (0, Q)$

And the 1-D observation $Z_{t}$ as: $Z_{t} = X_{t} + u_{t}, u_{t} \sim N (0, R)$

Steps to estimate $Q$ and $R$ :

Start with initial guesses for $Q$ and $R$ .
Run the Kalman filter to compute the likelihood of observed $Z_{t}$ values.
Adjust $Q$ and $R$ to maximize the likelihood.

Python

Here’s a handwritten simple example for MLE in Python:

import numpy as np
import matplotlib.pyplot as plt
from numpy.linalg import det, inv
from scipy.optimize import minimize

rng = np.random.default_rng(1024)

# Simulate data
dim_x = 1 # Number of unobservable true states
dim_z = 1 # Dimension of observations
n_steps = 100  # Number of time steps
F = np.array([[1.0]])  # State transition
H = np.array([[1.0]])  # Observation model
Q = np.array([[0.5]])  # Process noise covariance
R = np.array([[1.0]])  # Measurement noise covariance

# Generate true states and noisy observations
X = np.zeros((dim_x, n_steps))
Z = np.zeros((dim_z, n_steps))
for t in range(1, n_steps):
    X[:,t] = F @ X[:,t-1] + rng.multivariate_normal(np.zeros(dim_x), Q)
    Z[:,t] = H @ X[:,t] + rng.multivariate_normal(np.zeros(dim_z), R)

# Define the Kalman filter functions
def kalman_predict(x, P, F, Q):
    x_pred = F @ x
    P_pred = F @ P @ F.T + Q
    return x_pred, P_pred

def kalman_update(x_pred, P_pred, z, H, R):
    S = H @ P_pred @ H.T + R
    K = P_pred @ H.T @ inv(S)
    x_upd = x_pred + K @ (z - H @ x_pred)
    P_upd = (np.eye(len(P_pred)) - K @ H) @ P_pred
    return x_upd, P_upd, S

def run_kalman_filter(Q, R, F, H, x0, P0, Z):
    Q = np.array([[Q]])  # Process noise covariance
    R = np.array([[R]])  # Measurement noise covariance
    x, P = x0, P0

    log_likelihood = 0
    log_2pi = np.log(2 * np.pi)
    for z in Z[0,:]:
        x_pred, P_pred = kalman_predict(x, P, F, Q)
        x, P, S = kalman_update(x_pred, P_pred, z, H, R)
        innovation = z - H @ x_pred
        l = -0.5 * (dim_z * log_2pi + np.log(det(S)) + innovation.T @ inv(S) @ innovation)
        log_likelihood += l
    return -log_likelihood  # Negative log-likelihood to minimize

# Define objective function for parameter estimation
def objective(params, F, H, x, P, Z):
    Q, R = params
    return run_kalman_filter(Q, R, F, H, x, P, Z)

# Initial guess for the parameters: [Q, R]
initial_params = [0.1, 0.1]
x0 = np.array([[0]])  # Initial state estimate
P0 = np.array([[1]])  # Initial covariance

# Minimize the log-likelihood with bounds to ensure Q and R are positive
result = minimize(
    objective,
    initial_params,
    args=(F, H, x0, P0, Z,),
    method='L-BFGS-B',
    bounds=[(1e-5, None), (1e-5, None)]
)

estimated_Q, estimated_R = result.x

# Run the Kalman filter with the estimated parameters
estimated_Q = np.array([[estimated_Q]])
estimated_R = np.array([[estimated_R]])
x_est = np.zeros((dim_x, n_steps))
P_est = np.zeros((dim_x, dim_x, n_steps))
x, P = x0, P0
for t in range(n_steps):
    x_pred, P_pred = kalman_predict(x, P, F, estimated_Q)
    x, P, _ = kalman_update(x_pred, P_pred, Z[:,t], H, estimated_R)
    x_est[:,t] = x
    P_est[:,:,t] = P

# Plot the true states and estimates
plt.figure(figsize=(9, 6))
plt.plot(Z[0], 'o', label='Observations', color='b', alpha=0.3)
plt.plot(X[0], label='True State', color='b', linestyle='dashed')
plt.plot(x_est[0], label='Estimated State', color='r')
plt.xlabel('Time step')
plt.ylabel('Value')
plt.legend()
plt.title('Kalman Filter: True State, Observations, and Estimated State')
plt.annotate(f'True Q: {Q[0,0]:.2f}', xy=(0.05, 0.95), xycoords='axes fraction', fontsize=12, verticalalignment='top')
plt.annotate(f'True R: {R[0,0]:.2f}', xy=(0.05, 0.90), xycoords='axes fraction', fontsize=12, verticalalignment='top')
plt.annotate(f'Estimated Q: {estimated_Q[0,0]:.2f}', xy=(0.05, 0.85), xycoords='axes fraction', fontsize=12, verticalalignment='top')
plt.annotate(f'Estimated R: {estimated_R[0,0]:.2f}', xy=(0.05, 0.80), xycoords='axes fraction', fontsize=12, verticalalignment='top')
plt.show()

Matlab

I also tested 3 ways of estimating the parameters of Kalman filter in Matlab.

Handwritten.
ss model with handwritten log-likelihood.
ssm model without handwritten log-likelihood.

Results of estimates are basically the same. So I guess I can comfortably use my handwritten ones since they are easier to understand and extend if for example noises are correlated.

	%% Kalman Filter Parameter Estimation Comparison with Log-Likelihood

	% Clear workspace and set random seed for reproducibility
	clear; clc; close all;
	rng(1024);

	%% Data Generation (Used by All Sections)

	% Parameters
	dim_x = 1; % Number of unobservable true states
	dim_z = 1; % Dimension of observations
	n_steps = 100; % Number of time steps
	F = [1.0]; % State transition matrix
	H = [1.0]; % Observation matrix
	Q_true = [1.5]; % True process noise covariance
	R_true = [2.8]; % True measurement noise covariance

	% Generate true states and noisy observations
	X = zeros(dim_x, n_steps); % True states
	Z = zeros(dim_z, n_steps); % Observations
	for t = 2:n_steps
	% True state evolution
	X(:, t) = F * X(:, t-1) + mvnrnd(zeros(1, dim_x), Q_true)';
	% Observation
	Z(:, t) = H * X(:, t) + mvnrnd(zeros(1, dim_z), R_true)';
	end

	% Initial state and covariance
	x0 = zeros(dim_x, 1); % Initial state estimate
	P0 = eye(dim_x); % Initial covariance estimate

	%% Section 1: Handwritten Kalman Filter and Parameter Estimation

	% Handwritten Kalman Filter functions

	% Prediction step
	function [x_pred, P_pred] = kalman_predict(x, P, F, Q)
	x_pred = F * x;
	P_pred = F * P * F' + Q;
	end

	% Update step
	function [x_upd, P_upd, S] = kalman_update(x_pred, P_pred, z, H, R)
	S = H * P_pred * H' + R;
	K = P_pred * H' / S;
	x_upd = x_pred + K * (z - H * x_pred);
	P_upd = (eye(size(P_pred)) - K * H) * P_pred;
	end

	% Negative log-likelihood function with explicit log(2π)
	function nll = kalman_neg_log_likelihood_handwritten(params, Z, F, H, x0, P0)
	Q = params(1);
	R = params(2);
	if Q <= 0 \|\| R <= 0
	nll = Inf;
	return;
	end
	dim_z = size(Z, 1); % Dimension of observations
	n_steps = size(Z, 2);
	x = x0;
	P = P0;
	nll = 0;
	for t = 1:n_steps
	% Prediction
	[x_pred, P_pred] = kalman_predict(x, P, F, Q);
	% Innovation
	y = Z(:, t) - H * x_pred;
	S = H * P_pred * H' + R;
	% Update negative log-likelihood (including log(2π))
	nll = nll + 0.5 * (dim_z * log(2 * pi) + log(det(S)) + y' * (S \ y));
	% Update state
	[x, P, ~] = kalman_update(x_pred, P_pred, Z(:, t), H, R);
	end
	end

	% Parameter estimation using fminsearch
	initial_params = [1; 1]; % Initial guesses for [Q; R]
	options = optimset('Display', 'off', 'TolX', 1e-6);
	[estimated_params_handwritten, nll_handwritten] = fminsearch(@(params) kalman_neg_log_likelihood_handwritten(params, Z, F, H, x0, P0), initial_params, options);

	% Extract estimated Q and R
	Q_est_handwritten = estimated_params_handwritten(1);
	R_est_handwritten = estimated_params_handwritten(2);

	% Run Kalman filter with estimated parameters
	x_est_handwritten = zeros(dim_x, n_steps);
	P_est_handwritten = zeros(dim_x, dim_x, n_steps);
	x = x0;
	P = P0;
	for t = 1:n_steps
	% Prediction
	[x_pred, P_pred] = kalman_predict(x, P, F, Q_est_handwritten);
	% Update
	[x, P, ~] = kalman_update(x_pred, P_pred, Z(:, t), H, R_est_handwritten);
	% Store estimates
	x_est_handwritten(:, t) = x;
	P_est_handwritten(:, :, t) = P;
	end

	%% Section 2: Kalman Filter Using ss Model and Handwritten Log-Likelihood

	% Define the state-space system using ss
	sys = ss(F, [], H, [], 1); % Discrete-time system with sample time 1

	% Negative log-likelihood function with explicit log(2π)
	function nll = kalman_neg_log_likelihood_ss(params, sys, x0, P0, Z)
	Q = params(1);
	R = params(2);
	if Q <= 0 \|\| R <= 0
	nll = Inf;
	return;
	end
	dim_z = size(Z, 1); % Dimension of observations
	x = x0;
	P = P0;
	nll = 0;
	for t = 1:size(Z, 2)
	% Prediction
	x_pred = sys.A * x;
	P_pred = sys.A * P * sys.A' + Q;
	% Innovation
	y = Z(:, t) - sys.C * x_pred;
	S = sys.C * P_pred * sys.C' + R;
	% Update negative log-likelihood (including log(2π))
	nll = nll + 0.5 * (dim_z * log(2 * pi) + log(det(S)) + y' * (S \ y));
	% Update state
	K = P_pred * sys.C' / S;
	x = x_pred + K * y;
	P = (eye(size(P)) - K * sys.C) * P_pred;
	end
	end

	% Parameter estimation using fminsearch
	initial_params = [1; 1]; % Initial guesses for [Q; R]
	[estimated_params_ss, nll_ss] = fminsearch(@(params) kalman_neg_log_likelihood_ss(params, sys, x0, P0, Z), initial_params, options);

	% Extract estimated Q and R
	Q_est_ss = estimated_params_ss(1);
	R_est_ss = estimated_params_ss(2);

	% Run Kalman filter with estimated parameters
	x_est_ss = zeros(dim_x, n_steps);
	P_est_ss = zeros(dim_x, dim_x, n_steps);
	x = x0;
	P = P0;
	for t = 1:n_steps
	% Prediction
	x_pred = sys.A * x;
	P_pred = sys.A * P * sys.A' + Q_est_ss;
	% Innovation
	y = Z(:, t) - sys.C * x_pred;
	S = sys.C * P_pred * sys.C' + R_est_ss;
	% Update state
	K = P_pred * sys.C' / S;
	x = x_pred + K * y;
	P = (eye(size(P)) - K * sys.C) * P_pred;
	% Store estimate
	x_est_ss(:, t) = x;
	P_est_ss(:, :, t) = P;
	end

	%% Section 3: Kalman Filter Using ssm and Built-in Functions

	% Define parameter-to-matrix mapping function
	function [A, B, C, D, Mean0, Cov0, StateType] = LocalStateSpaceModel(params)
	Q = params(1);
	R = params(2);
	% State-space matrices
	A = 1; % State transition matrix
	B = sqrt(Q); % Process noise effect
	C = 1; % Observation matrix
	D = sqrt(R); % Measurement noise effect
	% Initial state
	Mean0 = 0;
	Cov0 = 1;
	% State type (0 for stationary)
	StateType = 0;
	end

	% Initial parameter guesses
	Param0 = [1; 1]; % Initial guesses for [Q; R]

	% Create ssm model using the mapping function
	Mdl = ssm(@LocalStateSpaceModel);

	% Estimate the parameters using built-in functions
	Z_transpose = Z';
	[EstMdl, EstParams, EstParamCov, logL] = estimate(Mdl, Z_transpose, Param0, 'Display', 'off');

	% Extract estimated Q and R
	Q_est_ssm = EstParams(1);
	R_est_ssm = EstParams(2);

	% Extract log-likelihood value
	nll_ssm = -logL; % Negative log-likelihood

	% Run Kalman filter with estimated parameters
	x_est_ssm = zeros(dim_x, n_steps);
	P_est_ssm = zeros(dim_x, dim_x, n_steps);
	x = EstMdl.Mean0;
	P = EstMdl.Cov0;

	% Extract estimated state-space matrices
	A = EstMdl.A;
	C = EstMdl.C;
	Q_est_matrix = EstMdl.B * EstMdl.B'; % Process noise covariance
	R_est_matrix = EstMdl.D * EstMdl.D'; % Measurement noise covariance

	for t = 1:n_steps
	% Prediction
	x_pred = A * x;
	P_pred = A * P * A' + Q_est_matrix;
	% Innovation
	y = Z(:, t) - C * x_pred;
	S = C * P_pred * C' + R_est_matrix;
	% Update state
	K = P_pred * C' / S;
	x = x_pred + K * y;
	P = (eye(dim_x) - K * C) * P_pred;
	% Store estimates
	x_est_ssm(:, t) = x;
	P_est_ssm(:, :, t) = P;
	end

	%% Comparison of Results

	% Plot the true state, observations, and estimated states from all methods
	figure;
	plot(1:n_steps, X(1, :), 'k-', 'LineWidth', 2, 'DisplayName', 'True State');
	hold on;
	plot(1:n_steps, Z(1, :), 'ko', 'MarkerSize', 4, 'DisplayName', 'Observations');

	plot(1:n_steps, x_est_handwritten(1, :), 'r--', 'LineWidth', 1.5, 'DisplayName', 'Estimated State (Handwritten)');
	plot(1:n_steps, x_est_ss(1, :), 'b-.', 'LineWidth', 1.5, 'DisplayName', 'Estimated State (ss Model)');
	plot(1:n_steps, x_est_ssm(1, :), 'g:', 'LineWidth', 1.5, 'DisplayName', 'Estimated State (ssm Model)');

	xlabel('Time Step');
	ylabel('State Value');
	title('Kalman Filter: Parameter Estimation with Different Methods');
	legend('Location', 'best');
	grid on;
	hold off;

	% Display estimated parameters and negative log-likelihoods from all methods
	fprintf('--- Parameter Estimation Results ---\n');
	fprintf('True Process Noise Covariance Q_true: %.4f\n', Q_true);
	fprintf('Estimated Q (Handwritten): %.4f\n', Q_est_handwritten);
	fprintf('Estimated Q (ss Model): %.4f\n', Q_est_ss);
	fprintf('Estimated Q (ssm Model): %.4f\n\n', Q_est_ssm);

	fprintf('True Measurement Noise Covariance R_true: %.4f\n', R_true);
	fprintf('Estimated R (Handwritten): %.4f\n', R_est_handwritten);
	fprintf('Estimated R (ss Model): %.4f\n', R_est_ss);
	fprintf('Estimated R (ssm Model): %.4f\n\n', R_est_ssm);

	fprintf('--- Negative Log-Likelihoods ---\n');
	fprintf('Negative Log-Likelihood (Handwritten): %.4f\n', nll_handwritten);
	fprintf('Negative Log-Likelihood (ss Model): %.4f\n', nll_ss);
	fprintf('Negative Log-Likelihood (ssm Model): %.4f\n', nll_ssm);

view raw kalman_filter_and_estimation.m hosted with ❤ by GitHub