 "EngNotes" are my engineering notes.  This is my way of creating a digital version of an engineering notebook.  This particular entry is on Kalman Filtering.
"EngNotes" are my engineering notes.  This is my way of creating a digital version of an engineering notebook.  This particular entry is on Kalman Filtering.
A Kalman filter is a filtering mechanism that estimates the state of something using a process that describes how the state changes over time and uses measurements that are mathematically related to the state to update the estimate. WORK IN A DESCRIPTION OF PROCESS. REMINDER THE PROCESS INCLUDES THE 'A' MATRIX BUT ALSO IN A MORE GENERAL SENSE INCLUDES THE CONTROL INPUT AND PROCESS NOISE.
A common application of a Kalman filter is radar target tracking. In this application, the state being estimated is the target being tracked. The process that describes how the state changes is a set of kinematic equations that relate position, velocity, acceleration, etc. Common measurements a radar periodically makes that are mathematically related to the state/target are range, azimuth, elevation.
The Kalman filter process is recursive. Only the previous filter estimates and a new measurement are needed to make new filter estimates. As a result, the Kalman filter is computationally efficient, making it attractive for many real-time applications. Other estimation methods (example: linear least squares fitting) require all previous measurements to estimate a state which can be computationally and memory intensive.
Types of Kalman Filters
There are a number of Kalman filters implementations. They all perform the same function but differ in the types of process and measurement relationships they support.
- "Plain" Kalman Filter (KF)
- Extended Kalman Filter (EKF)
- Unscented Kalman Filter (UKF)
"Plain" Kalman Filter
The simplest, the Kalman Filter, assumes that a state's process and the relationship between the state and the measurements used to update the state estimate are linear.  This makes the process of updating state estimates very easy because applying linear operations to the state estimate (a Gaussian random variable) is a linear operation itself that results in another Gaussian random variable.  Simple enough.
A graphical way to think of a linear transformation of a Gaussian random variable is in the context of a coordinate system transformation. The linear transformation is changing the coordinate system of the variables. Below are explicit transformations, both linear and non-linear, of a Gaussian random variable.
The Extended Kalman Filter and the Unscented Kalman Filter are extensions of the Kalman Filter that do not require linear relationships. These filter types allow the process and/or the measurement relationship to the state to be a non-linear function. Since many real-world relationships are non linear, these filter types are common and extremely popular. Often and confusingly, the adjective Extended or Unscented is frequently left off when using these filter types.
Both the EKF and the UKF work very well when the transformed distribution is nearly Gaussian. Loosely defined, non-linear transformations of Gaussian random variables result in random variables with nearly Gaussian distributions when the transformation is approximately linear near the mean of the variable being transformed and the variance of the variable being transformed is small such that its values are concentrated in a small neighborhood near the mean.
Linear Operation On Gaussian
Assume $\vec{x}$ is a multivariate Gaussian random variable:
$\vec{x} = \mathcal{N}\left ( \vec{\mu} ,V \right )$ 
And $\vec{y}$ is a linear transformation of $\vec{x}$:
$\vec{y}=\vec{a}+B\vec{x}$
Then expected value of $\vec{y}$ is:
$E\left [ \vec{y} \right] = \vec{a}+B\vec{\mu}$
And the covariance of $\vec{y}$ is:
$COV\left [ \vec{y} \right ]=BVB^{T}$
Proof of covariance:
$\begin{matrix}
COV\left [ \vec{y}  \right ] & = & E\left [ \left ( \vec{y} - \mu_{\vec{y}} \right ) \left ( \vec{y}-\mu_{\vec{y}} \right )^{T} \right ] \\
 & = & E\left [ \left( \left( \vec{a}+B\vec{x} \right)-\left( \vec{a}+B\mu_{\vec{x}} \right) \right) \left( \left( \vec{a}+B\vec{x} \right)-\left( \vec{a}+B\mu_{\vec{x}} \right) \right)^{T} \right] \\ 
 & = & E \left[ \left( B \left(\vec{x}-\mu_{\vec{x}} \right) \right) \left( B \left(\vec{x}-\mu_{\vec{x}} \right) \right) ^{T} \right] \\ 
 & = & E \left[ B \left(\vec{x}-\mu_{\vec{x}} \right) \left( \vec{x}-\mu_{\vec{x}} \right)^{T} B^{T} \right] \\ 
 & = & B E \left[ \left(\vec{x}-\mu_{\vec{x}} \right) \left( \vec{x}-\mu_{\vec{x}} \right)^{T} \right] B^{T} \\ 
 & = & BVB^{T} 
\end{matrix}$
A graphical way to think of a linear transformation of a Gaussian random variable is in the context of a coordinate system transformation. The linear transformation is changing the coordinate system of the variables. Below are explicit transformations, both linear and non-linear, of a Gaussian random variable.
Linear Transformation of Gaussian Random Variable X: $y=Bx$
$f_{X} \left( x \right) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\frac{\left(x-\mu\right)^{2}}{2\sigma^{2}}}$
$x=\frac{y}{B}$
$\left| \frac{dx}{dy} \right| = \frac{1}{B}$
$\begin{matrix} f_{Y} \left( y \right) & = & f_{X}(x(y))\left|\frac{dx}{dy}\right| \\ & = & \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\frac{\left(\frac{y}{B}-\mu\right)^{2}}{2\sigma^{2}}}\frac{1}{\left|B\right|} \\ & = & \frac{1}{\sqrt{2\pi(B\sigma)^{2}}}e^{-\frac{(y-B\mu)^{2}}{2\left(B\sigma\right)^{2}}} \end{matrix}$
The result is a Gaussian with mean $B\mu$ and variance $B^{2}\sigma^{2}$
$\left| \frac{dx}{dy} \right| = \frac{1}{B}$
$\begin{matrix} f_{Y} \left( y \right) & = & f_{X}(x(y))\left|\frac{dx}{dy}\right| \\ & = & \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\frac{\left(\frac{y}{B}-\mu\right)^{2}}{2\sigma^{2}}}\frac{1}{\left|B\right|} \\ & = & \frac{1}{\sqrt{2\pi(B\sigma)^{2}}}e^{-\frac{(y-B\mu)^{2}}{2\left(B\sigma\right)^{2}}} \end{matrix}$
The result is a Gaussian with mean $B\mu$ and variance $B^{2}\sigma^{2}$
Nonlinear Transformation of Gaussian Random Variable X: $y=x^{2}$
$f_{X} \left( x \right) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\frac{\left(x-\mu\right)^{2}}{2\sigma^{2}}}$
$x=\pm\sqrt{y}\,,\;y>0$
$\left| \frac{dx}{dy} \right| = \frac{1}{2\sqrt{y}}$
$\begin{matrix} f_{Y} \left( y \right) & = & f_{X}(x(y))\left|\frac{dx}{dy}\right| \\ & = & 2\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\frac{\left(\sqrt{y}-\mu\right)^{2}}{2\sigma^{2}}}\frac{1}{2\sqrt{y}} \\ & = & \frac{1}{\sqrt{2\pi\sigma^{2}y}}e^{-\frac{\left(\sqrt{y}-\mu\right)^{2}}{2\sigma^{2}}} \end{matrix}$
The result is not a Gaussian. In this case, it is a non-central (non-zero mean) chi-squared distribution.
$\left| \frac{dx}{dy} \right| = \frac{1}{2\sqrt{y}}$
$\begin{matrix} f_{Y} \left( y \right) & = & f_{X}(x(y))\left|\frac{dx}{dy}\right| \\ & = & 2\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\frac{\left(\sqrt{y}-\mu\right)^{2}}{2\sigma^{2}}}\frac{1}{2\sqrt{y}} \\ & = & \frac{1}{\sqrt{2\pi\sigma^{2}y}}e^{-\frac{\left(\sqrt{y}-\mu\right)^{2}}{2\sigma^{2}}} \end{matrix}$
The result is not a Gaussian. In this case, it is a non-central (non-zero mean) chi-squared distribution.
Extended & Unscented Kalman Filters
The Kalman filter works with linear relationships. Applying non-linear operations/transformations to Gaussian random variables does not (in general) result in new Gaussian random variables. What happens when the process and/or the measurement relationship to the state are non-linear? We must linearize the problem. This is where the EKF and the UKF come into play.The Extended Kalman Filter and the Unscented Kalman Filter are extensions of the Kalman Filter that do not require linear relationships. These filter types allow the process and/or the measurement relationship to the state to be a non-linear function. Since many real-world relationships are non linear, these filter types are common and extremely popular. Often and confusingly, the adjective Extended or Unscented is frequently left off when using these filter types.
Both the EKF and the UKF work very well when the transformed distribution is nearly Gaussian. Loosely defined, non-linear transformations of Gaussian random variables result in random variables with nearly Gaussian distributions when the transformation is approximately linear near the mean of the variable being transformed and the variance of the variable being transformed is small such that its values are concentrated in a small neighborhood near the mean.
Extended Kalman Filter
The Extended Kalman filter uses derivatives (Calculus, gasp!) to linearize the process and/or the measurement relationship to the state at the current estimate. This approach tends to work well for relationships that are close to linear over the timescales involved but can break down if they are not.Unscented Kalman Filter
The Unscented Kalman Filter uses what is known as an Unscented Transform to calculate the statistics of a random variable that undergoes a nonlinear transformation. Samples of the state undergoing a non-linear transformation are chosen such that the sample mean and sample covariance match the currently predicted mean and covariance of the process. The non-linear process is applied to each of the sample points and the resulting mean and covariance of the transformed points is used to represent a Gaussian approximation of the transformed Gaussian random variable. Note, this method approximates the transformed random variable's distribution whereas the EKF approximates the non-linear function/transformation.Simple Visual Description
Visually thinking about what a Kalman Filter is doing can be very handy. I find it helps see the forest through the trees. The filter can be boiled down into a data fusion problem. A filter starts with an initial seed that has a relatively high uncertainty. Measurements that have their own uncertainty are taken that give us an estimate of the state. Each measurement enhances the filter's knowledge of the state. How much it enhances the state estimate depends on the noise (covariance) associated with the measurements. At the beginning of the filter process, the measurements and the state estimates may have similar certainty. If designed correctly, as more measurements are made, the state estimate certainty should increase and the state prediction should become more accurate. Below is animation of this process. In this simple case, the filter is attempting to estimate a signal whose true value is 100.
Below are animations of the same process as above except I've introduced some "issues" with the filters.  The filter producing the illustration on the left has a poor estimate of the measurement covariance that is input to the filter.  In the case illustrated, the measurement covariance is underestimated which causes the state estimate to vary rapidly and to a much larger degree than the state's estimated covariance suggests.  The illustration on the right shows the impact of a bias or error on the initial state estimate or equivalently, an improbable measurement seeding the initial state estimate.  As more and more measurements are taken and used to update the filter, the state estimate bias slowly shrinks.
| Kalman Filter With Incorrect Measurement Covariance | Kalman Filter With Bias on Initial State Estimate | 
|  |  | 
Visual Example of Linear and Non-Linear Transformations
Visualizing how a transformation changes a random variable's probability density function is helpful in intuitively understanding how the KF works.  After all, state and measurement random variables are constantly being transformed by linear or non-linear functions as part of the KF.  Below is a derivation of how to determine the transformed random variable's probability density function.
Derivation of Probability Density Function Transformation
X is a random variable with pdf f(x) 
$ P \left( a \leqslant X < b \right) = \int_{a}^{b} f\left( x \right) dx $
Y is a random variable derived from X
$ Y = y \left( X \right) $
$ P \left( y \left( a \right) \leqslant Y < y \left( b \right) \right) = P \left( a \leqslant X < b \right) $
Using integration by substitution
$ P \left( y \left( a \right) \leqslant Y < y \left( b \right) \right) = \int_{y \left( a \right)}^{y \left(b\right)} f\left(x\left(y\right)\right)\frac{dx}{dy}dy $
If we define g(y) as
$ g \left( y \right) = f \left ( x \left ( y \right ) \right ) \frac{dx}{dy} $
Then we can re-write x.x
$ P \left( y \left( a \right) \leqslant Y < y \left( b \right) \right) = \int_{y \left( a \right)}^{y \left(b\right)} g \left( y \right) dy $
By definition $ g \left( y \right) $ is the transformed pdf associated with the random variable Y $ g \left ( y \right ) = f \left ( x \left ( y \right ) \right ) \left | \frac{dx}{dy} \right | $
$ P \left( a \leqslant X < b \right) = \int_{a}^{b} f\left( x \right) dx $
Y is a random variable derived from X
$ Y = y \left( X \right) $
$ P \left( y \left( a \right) \leqslant Y < y \left( b \right) \right) = P \left( a \leqslant X < b \right) $
Using integration by substitution
$ P \left( y \left( a \right) \leqslant Y < y \left( b \right) \right) = \int_{y \left( a \right)}^{y \left(b\right)} f\left(x\left(y\right)\right)\frac{dx}{dy}dy $
If we define g(y) as
$ g \left( y \right) = f \left ( x \left ( y \right ) \right ) \frac{dx}{dy} $
Then we can re-write x.x
$ P \left( y \left( a \right) \leqslant Y < y \left( b \right) \right) = \int_{y \left( a \right)}^{y \left(b\right)} g \left( y \right) dy $
By definition $ g \left( y \right) $ is the transformed pdf associated with the random variable Y $ g \left ( y \right ) = f \left ( x \left ( y \right ) \right ) \left | \frac{dx}{dy} \right | $
Example illustrations of transformed pdfs are shown below. The grey dotted lines across the subplots illustrate how points in the original pdf are stretched/scaled by the transformation function and subsequently mapped to the new pdf. The first example is a linear transformation of a Gaussian random variable X. Since the transformation is linear, the resulting transformed variable Y is also Gaussian. $ \mathcal{N}(0,0.25) \rightarrow \mathcal{N}(0,0.5) $
Visualizing How the EKF and UKF Linearize the Transformation Process
The EKF and the UKF approximate non-linear transformations of a Gaussian random variable.  In general, the true distribution of the resulting random variable is no longer Gaussian.  However, both the EKF and the UKF linearize the problem and estimate the transformed variable as a Gaussian random variable.  The EKF estimates the transformed distribution by approximating the non-linear function as a linear function and then applying that linear transformation to the original Gaussian distribution.  The UKF creates sample points (sigma points) of the original distribution, explicitly evaluates the non-linear function on these sample points, and then estimates a transformed Gaussian distribution using the resulting mean and standard deviation of the transformed sample points.
Below is an illustration of a cosine transformation function on a Gaussian random variable.  The resulting true distribution, linearized distribution estimate (akin to EKF), and sampled distribution estimate (akin to UKF) are shown.  There are noticeable differences between the estimates and the true distribution.   
The illustration below is the same as above except the random variable being transformed has a smaller variance.  The resulting transformed pdfs more closely match each other compared to the above illustration.  This behavior is intuitive.  As the variance of the variable being transformed shrinks the functional span of the non-linear transformation shrinks and more closely approximates a linear transformation.   
Algorithm Descriptions
Kalman Filter
| Variable | Size | Description | 
|---|---|---|
| x | n | State predictions | 
| z | m | Measurements | 
| R | m x m | Measurement covariance | 
| A | n x n | Process (or difference equation), relates state at previous step to state are the current step | 
| H | m x n | Measurement equation relates measurements to the state | 
| P | n x n | State estimate covariance | 
| K | n x m | Kalman filter gain, weight that controls how much measurements vs predictions influence the filter estimate | 
| u | l | Control signal | 
| B | n x l | Control process, describes how the control signal changes the state | 
| Q | n x n | Process noise covariance | 
State Prediction $\hat{x}_{\hat{k}} = A\hat{x}_{k-1}+Bu_{k-1}$
State Covariance Prediction $P_{\hat{k}}=AP_{k-1}A^{T}+Q$
Filter Gain Calculation $K_{k}=P_{\hat{k}}H^{T}(HP_{\hat{k}}H^{T}+R)^{-1}$
Updated State Estimate $\hat{x}=\hat{x}_{\hat{k}}+K_{k}(z_{k}-H\hat{x}_{\hat{k}})$
Updated State Estimate Covariance $P_{k}=(I-K_{k}H)P_{\hat{k}}$
Extended Kalman Filter
| Variable | Size | Description | 
|---|---|---|
| x | n | State predictions | 
| z | m | Measurements | 
| R | m x m | Measurement covariance | 
| a | Non-linear process (or difference) equations that relate state at previous step to state are the current step | |
| A | n x n | Jacobian of the process (or difference equations), matrix of partial derivatives that provide a linear approximation of the process functions at the current state prediction | 
| h | Non-linear measurement equations that relate the measurements to the state | |
| H | m x n | Jacobian of the measurement equations, matrix of partial derivatives that provide a linear approximation of the measurement functions at the current state predictions | 
| P | n x n | State estimate covariance | 
| K | n x m | Kalman filter gain, weight that controls how much measurements influence the state | 
| u | l | Control signal | 
| B | n x l | Control process, describes how the control signal changes the state | 
| Q | n x n | Process noise covariance | 
| W | n x n | Process noise jacobian | 
State Prediction $\hat{x}_{\hat{k}} = a \left ( \hat{x}_{k-1},u_{k-1}, 0 \right )$
State Covariance Prediction $P_{\hat{k}}=A_{k}P_{k-1}A^{T}_{k}+W_{k}Q_{k-1}W^{T}_{k}$ Filter Gain Calculation $K_{k}=P_{\hat{k}}H^{T}_{k}(H_{k}P_{\hat{k}}H^{T}_{k}+V_{k}R_{k}V^{T}_{k})^{-1}$
State Prediction Update $\hat{x}_{k}=\hat{x}_{\hat{k}}+K_{k}(z_{k}-h \left( \hat{x}_{\hat{k}},0 \right) )$
State Covariance Prediction Update $P_{k}=(I-K_{k}H_{k})P_{\hat{k}}$
Unscented Kalman Filter
| Variable | Size | Description | 
|---|---|---|
| x | n | State predictions | 
| X | n x 2n+1 | Sigma points, a "cloud" of points that sample the random variable around its estimate, undergo a non-linear transformation, then used to estimate the transformed random variable's mean and covariance | 
| mW | 2n+1 | Weight vector used to calculate the transformed distribution's mean from the transformed sigma points | 
| cW | 2n+1 | Weight vector used to calculate the transformed distribution's covariance from the transformed sigma points | 
| z | m | Measurements | 
| R | m x m | Measurement covariance | 
| a | Non-linear process (or difference) equations that relate state at previous step to state are the current step | |
| h | Non-linear measurement equations that relate the measurements to the state | |
| P | n x n | State estimate covariance | 
| K | n x m | Kalman filter gain, weight that controls how much measurements influence the state | 
| u | l | Control signal | 
| B | n x l | Control process, describes how the control signal changes the state | 
| Q | n x n | Process noise covariance | 
| W | n x n | Process noise jacobian | 
$\begin{matrix} x_{0} & = & \mu_{x} & W_{0} & = & \frac{k}{n+k}\\ x_{i} & = & \mu_{x}+\left(\sqrt{\left( n+k \right)P_{x}} \right) & W_{i} & = & \frac{1}{2(n+k)} \\ x_{i+n} & = & \mu_{x}-\left(\sqrt{\left( n+k \right)P_{x}} \right) & W_{i+n} & = & \frac{1}{2(n+k)} \end{matrix}$ $y_{i}=f \left(x_{i} \right)$ $\hat{y}=\sum_{i=0}^{2n}W_{i}y_{i}$ $P_{y}=\sum_{i=0}^{2n} \left( y_{i}-\hat{y} \right ) \left( y_{i}-\hat{y} \right )^{T}$
In the equations below $x$ represents a random variable and $\theta$ represents the set of parameters that describe the distribution.
Probability Density Function:
$ f \left ( x | \theta \right ) $
The probability of observing $x$ given a distribution described by $\theta$
Joint Probability Density Function:
$ f \left ( x_{1},...,x_{n} | \theta \right ) = P \left ( X_{1} = x_{1},...,X_{n}=x_{n} | \theta \right ) = f \left ( x_{1} | \theta \right ) \cdot ... f \left ( x_{n} | \theta \right ) = \prod \limits_{i=1}^{n} f \left ( x_{i} | \theta \right ) $
The probability of observing ($x_{i}$) multiple independent and identically distributed random variables ($X_{i}$) given by a distribution described by $\theta$.
Likelihood Function:
$ L\left ( \theta |x \right ) = \prod\limits_{i=1}^{n} f \left ( x_{i} | \theta \right ) $
If we swap the independent and dependent variables from the joint probability density function above we arrive at what is called the likelihood function. In this equation, the distribution parameters ($\theta$) are treated as the unknowns and the observations ($x_{i}$) are known.
In the likelihood function, we want to find the distribution parameters ($\theta$) that maximize the
A trick normally used is to perform the above steps against the natural log of the likelihood function, also known as the loglikelihood function. This is a trick because the loglikelihood often has a much simpler form than the likelihood, is also often easier to differentiate, and because the natural log is an increasing function, maximizing the loglikelihood is the same as maximizing the likelihood.
Loglikelihood Function:
$ L_{ln}\left ( \theta |x \right ) = ln \left ( \prod\limits_{i=1}^{n} f \left ( x_{i} | \theta \right ) \right ) $
For a Gaussian distribution, $\theta$ is comprised of a mean ($\mu$) and variance ($\sigma$), $\theta = \left( \mu,\sigma \right )$
In the above equations, it is assumed there are multiple iid (independent and identically distributed) random variables (measurements).
The Kalman filter combines or fuses two Gaussian distributed data sources, the filter's predicted measurements and the actual measurements, to estimate the filter's current state. Combining these two sources is done by calculating the joint probability distribution of the two sources and then maximizing the resulting likelihood function
prediction of a measurement the previous state estimate's (a Guassian distribution) with that of a
, the estimate of the filter state is accomplished by
the random variables are independent but are usually not identically distributed. The state estimated measurement(s) and the sensor measurement(s) are independent Gaussian random variables however their pdfs are rarely the same. In this case the likelihood looks like the following:
$ L\left ( \theta |x \right ) = f_{pdf} \left ( x_{meas}, \mu, \sigma_{meas} \right ) \times f_{pdf} \left ( x_{stateEstMeas}, \mu, \sigma_{stateEstMeas} \right ) $
where “log” means natural log (logarithm to the base e).
Code
GitHub
Good References
http://cecas.clemson.edu/~ahoover/ece854/ [1] An Introduction to the Kalman Filter [Local Mirror]http://www.cs.unc.edu/~welch/media/pdf/kalman_intro.pdf
[2] Understanding the Basis of the Kalman Filter [Local Mirror]
http://www.cl.cam.ac.uk/~rmf25/papers/Und erstanding the Basis of the Kalman Filter.pdf
[3] Kalman Filter For Dummies [Local Mirror]
http://bilgin.esme.org/BitsBytes/KalmanFilterforDummies.aspx
[4] First-Hand:The Unscented Transform [Local Mirror]
http://ethw.org/First-Hand:The_Unscented_Transform
[5] The Unscented Kalman Filter for Nonlinear Estimation [Local Mirror]
http://www.cslu.ogi.edu/nsel/ukf/
[6] Reduced Sigma Point Filters for the Propagation of Means and Covariances Through Nonlinear Transformations [Local Mirror]
[7] A New Extension of the Kalman Filter to Nonlinear Systems [Local Mirror]
[8] Unscented Filtering and Nonlinear Estimation [Local Mirror]
[9] A more robust unscented transform [Local Mirror]
https://www.mitre.org/sites/default/files/pdf/vanzandt_unscented.pdf
[10] How a Kalman filter works, in pictures [Local Mirror]
http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/
 






 
Hey, David.
ReplyDeleteCan you fix the Code link for this illustration?
I am wondering what the difference will be like for some other functions of Y (other than cos).
Would be better I can play with your code.
Cheers
Arthur