Talk:Kalman filter

From formulasearchengine
Jump to navigation Jump to search

Template:Talkheader Template:WikiProjectBannerShell

Suggestion on notation between deterministic and stochastic variables

Not all variables are stochastic. An estimate is a deterministic value, but its estimator is often represented using the same notation and it is a stochastic variable. right? The input is usually seen as being deterministic, as well as the process and observation matrices (hence they can step out of the expected value). But what about the state vector? The state is well defined no (deterministic I mean)? But at the same time it is governed by a stochastic evolution in time (process noise). I'm confused. —Preceding unsigned comment added by (talk) 14:52, 5 June 2008 (UTC)

Suggestion on new sub-topic for Kalman Filter

The only mention that KF gets in the area of econometrics is for the page to say it is used in teh reference section at the bottom. Can we make a new topic on the page, stating how it is used, and why. There is not too much in the literature about this, due to how secretive the work is by the companies that are using it (i guess) but there are some good papers (GROENEWOLD & FRASER). The useage seems to center around the fact that CAPM Beta is unstable across time, and that use of KF can lead to improved stability - pretty important if you are taking your trading signals from Mod Beta. Any experts on this topic here?

On the Unscented Kalman Filter

In the UKF, what is the significance of the term "unscented"? If not embarrassing, it should be included in the description...

Nice KF page overall - it gives enough on each point for more to investigate further, or if you just need the equations in a hurry.

I have some comments on UKF though. Personally, I think it needs a separate page - if nothing else but for the number of different unscented transforms that are described by both Julier and Van der Merwe etc.

The other thing that isn't really clear is when calculating the cross-covariance matrix. (it isn't a cross-correlation matrix, sorry, I made that change to covariance there). The problem is that often the state vector and the observation vector are not of the same length (e.g. GPS/INS integration), and therefore, over which indicies are you summing, and what are the corresponding weights to use - The Wc as calculated from the Sigma Points from the state, the Sigma points as calculated from the observation, some combination, or do you need to explicitly augment the state **and** observation noises into the augmented state vector - something that I didn;t think was absolutely nessessary (though most papers tend to do it that way anyway). Damien d 12:04, 30 April 2007 (UTC)

Excellent page

This is a quite lucid explanation of a (potentially) challenging topic/set of concepts. As such is deserves recognition ... and inclusion, maybe, in the CD version of wikipedia? 17:52, 9 August 2006 (UTC)

Totally agree with you shampoo (talk) 12:25, 6 July 2008 (UTC)

Being a total noob, this page was the best resource I had found on the topic and I would have agreed with you until I found this: . Perhaps a more general explanation/emphasis of /what/ the filter is doing (recursively refining a series of uncertain measurements over time based on a variance-weighted average) using graphs like the conditional probability densities (i.e. Fig 1.7 in that linked chapter) would be a good way to start this off. Explaining it in the one-dimensional case first, then introducing the matrix version would be good, too. One thing that really throws me off is the priority of the section "Underlying dynamic system model." I think one should be able to understand the Kalman filter before one understands Markov chains, etc (I sure don't). This may be really important to the theory underlying the filter or the precision of the mathematical definition, but it's confusing if you've just seen a reference to this topic and want to know what the heck is going on.

I don't really feel qualified to make huge changes to this page, but maybe I'll put up some graphs based on that Maybeck chapter if I get around to it and nobody objects. Kronick (talk) 23:30, 23 January 2010 (UTC)

Many thanks for this exemplary explanation. The different perspectives and the comparison with HMM helped me to put it in the right context. — Preceding unsigned comment added by (talk) 16:29, 21 June 2011 (UTC)


In the UKF section, there are four or five occurrences of sums from 1 to N to reconstruct the estimate and its covariance from the samples. These sums should run from 0 to 2L (see the original paper). I'm not really aware of wikipedia protocol etc. or I'd fix it myself. Best wishes, Nathaniel

It seems like you are indeed right in that the summing indices are wrong (or at least unclear). I've attempted to correct the sums to run from 0 to 2L now. Please review my changes to make sure that they have been succesfully corrected.
Please feel free to make your own edits in wikipedia in the future. It's easy, just read Wikipedia:How to edit a page to learn about wiki-formating. But it would be favourable if you register a username and log in prior to making edits, since that makes it easier to trace article history and users contributions. --Fredrik Orderud 18:32, 25 May 2006 (UTC)

In the section titled "Underlying dynamic system model", it would appear there is a typo in the following line: "Bk is the control-input model which is applied to the control vector uk". I beleive this should be ".... control vector uk-1". Dave E. --—Preceding unsigned comment added by (talkcontribs)

Fixed. --Drizzd (talk) 16:20, 2 July 2008 (UTC)

What does HMM mean?

In section Relationship to recursive Bayesian estimation: "Using these assumptions the probability distribution over all states of the HMM can be written simply as: ..." Can somebody explain (and update in the article)?

HMM is Hidden Markov model. The Kalman filter model can be considered as a HMM, since it is both hidden and Markov. It is hidden because the state (x-vector) is only indirectly observable through the observation model; and it is Markov because the current state only depends on the previous state and is therefore conditionally independent of any state before the previous state. --Fredrik Orderud 01:52, 20 November 2005 (UTC)
This is my first time contributing to a Wikipedia page, so I don't want to tread on any toes. But I think it's a little confusing to say that the Kalman Filter is a type of HMM. Both are latent variable models, for sure, in that the state variables are Hidden or latent. But the normal definition of the HMM is that the latent variables are discrete, and the time update of the latent variables is governed by a transition probability matrix. With the Kalman Filter, the latent variables are continuous, and the time update is giverned by the state matrix, which represents a linear dynamical system. In principle, an HMM that modelled the Kalman filter to arbitrary accuracy could be calculated by discretizing the state variables, and then constructing a transition matrix that modelled the dynamics and the noise term. However, in practice this would need a huge number of states for a Kalman filter where the dimension of state space is much greater than one due to the Curse of Dimensionality. Perhaps it would be more clear to give a more expanded discussion on the relationship between Kalman filters and the HMM. I haven't yet figured how to put a reference into a Wiki discussion page, but a very good summary paper is A unifying review of Linear Gaussian Models by Roweis and Ghahramani (Neural Computation Vol 11 No 2, 1999). Possibly a more accurate summary sentence would be to say that the Kalman filter is analogous to the Hidden Markov Model, where the state variables and distributions are continuous valued, rather than discrete. --Alan1507 09:07, 9 May 2006 (UTC)

I've edited the section on "Underlying Dynamical System" to attempt to clarify the relationship between Kalman Filters and HMMs. However, I am still not happy with the section on Relationship to recursive Bayesian estimation, particularly the sentence that states that the measurements are the observed states of the Hidden Markov Model. The observations in a Hidden Markov Model are used to infer the values of the Hidden States, which, by definition are not directly observed. So I do not understand this statement at all - the states of the Hidden Markov Model are hidden, or latent variables, and these are analogous to the system variables in the Kalman Filter. But it's possible the author had some other meaning in mind - perhaps this could be clarified? --Alan1507 20:42, 9 May 2006 (UTC)

The Relationship to recursive Bayesian estimation section was written by User:Chrislloyd in February 2005, and has remained pretty much untouched since. The section is important, since it (attempts to) relates Kalman filtering into the bigger picture of sequential state estimation which it is a part of, but it could probably be formulated more clearly. Any help in improving the section is therefore greatly appreciated. --Fredrik Orderud 00:23, 10 May 2006 (UTC)

I'll have a go when I've time - need to think it through carefully ;-) --Alan1507 07:51, 10 May 2006 (UTC)

Relationship to recursive Bayesian estimation

User:Chrislloyd, you added this section (which was then titled "Derivation") around February 2/10. Now that we have a derivation section contributed by User:Orderud, is it still necessary? What does it add? Thanks! — ciphergoth 10:18, 2005 Apr 28 (UTC)

I think the section is still importaint, since it relates Kalman filtering to the "bigger picture" of recursive Bayesian estimation (which Kalman filtering is a part of). --Fredrik Orderud 20:08, 28 Apr 2005 (UTC)
In that case I think it needs substantial work to make its point clear, since I've tried very hard to understand it and come up with very little. Is p(X) the probability density function of X? It doesn't link to probability density function and that latter doesn't mention p(X) having that meaning. How is PDF defined for vectors and joint distributions? I think I can guess, but it's not discussed in probability density function making it a bit demanding to infer. Even Lebesgue integration only defines integration from reals to reals, leaving one to infer how integration of functions such as p: (R x R) -> R is defined (though it seems straightforward to extend it to any real function whose domain is measurable). What does "The probability distribution of updated" mean? What is the denominator unimportant to? What do the probability density functions given at the end mean? How does it all tie together to say something cohesive and substantial? — ciphergoth 22:21, 2005 Apr 28 (UTC)
You're probably right in that it's poorly written (I haven't read until now myself), but it's still very importaint. The variable p(x) is, as you thought, the probability distribution of the state x. The Kalman filter replaces p(x) with a Gaussian distribution parametrized by a state estimate and a covariance. IEEE SignalProc. had a quite straightforward tutorial in 2002, containing the derivation of Kalman filter from a recursive Bayesian estimator. It is absolutely worth a read. --Fredrik Orderud 22:39, 28 Apr 2005 (UTC)
Thanks, that helps a lot! — ciphergoth 23:05, 2005 Apr 28 (UTC)
At a glance it looks like that paper is the basis of this section. I can follow the paper much better, since I can see what it's trying to get at. Unfortunately, it doesnt AFAICT actually prove its assertions about the Kalman filter at all - it just states "if you do this, you get the correct conditional PDFs". If we're going to do the same, we should make it explicit that we're stating without proof that the Kalman filter gives the correct PDFs. I think I can see how to do this. (Also, it's a pity the equations are bitmaps rather than scalable fonts in the PDF of the paper!) — ciphergoth 21:46, 2005 Apr 29 (UTC)
This section is not there to prove the optimality of the Kalman filter. The "proof" section already does that. It's main intent is to demonstrate how recursive Bayestion estimation can be simplified into tractable linear equations with Gaussian PDFs when dealing with linear state-space models subject to Gaussian noise. The derivations are pretty standard, and found in many Kalman textbooks. Your can also find the paper on IEEE Xplore in much higher quality, but this requires an subscription. --Fredrik Orderud 11:03, 30 Apr 2005 (UTC)
OK, but the paper makes it look as if our proof is insufficiently precise, because it talks about expected values, covariance and so forth without talking about what they're conditioned on. Is it
? It feels as if there's big gaps in our proof that the Kalman filter is valid... — ciphergoth 17:38, 2005 Apr 30 (UTC)
I'm pretty sure , since the Kalman filter is a causal recursive estimator which incorporates the latest measurements available into its estimates. --Fredrik Orderud 11:43, 1 May 2005 (UTC)

Underlying dynamical system

I removed the reference to a Markov Chain, and replaced it with Probabilistic Graphical Model, as I think this introduces less confusion - although Markov Chains can be defined on continuous variables, it seems the most widely understood definition is as a Finite state machine, as, for example at . Hidden Markov Models and Kalman Filters are derived from the same Probabilistic Graphical Model. When time permits, I might write a section illustrating the duality between HMM and the Kalman Filter.

I disagree.
Probabilistic graphical model is a rarely used "nonsense" term, that does not say anything about the specific Bayesian network encountered in Kalman filtering. The process- & observation-models yields a Bayesian network on a special sequential/recursive form, consisting of 1st order Markov chains for state propagation and 0th order Markov chains for the measurements. This form is better known as a Hidden Markov Model with continous state.
Please go change the Markov chain article first is you think that Markov chains are somewhat restricted to systems with discrete state. (no pun intended)
Alternatively, you can use the term "Markov process" instead [1], which undoubtedly covers systems with continous state. --Fredrik Orderud 19:15, 10 May 2006 (UTC)
I agree with Fredrik Orderud - just because the state modelled by a Kalman filter is continuous, doesn't mean it's not a Markov model. I know that it's more usual to use Markov model to refer to things with discrete state but it's not the only application. Please stop removing these references from the article! — ciphergoth 19:40, 10 May 2006 (UTC)

---Untrue that Kalman filter requires Gaussian Noise model--- I have developed Kalman filters for NASA and taught a graduate level course on the topic. I think the literature on the subject, including this Wiki article is horrible. For example, is untrue that the Kalman filter underlying model depends on Gaussian modeled noise. The derivation minimizes the generalized state variance (opaquely referred to as "covariance matrix") using the data and it's uncertainly. On the contrary, data does not need to be Gaussian distributed... its expected error needs to be known (or as is usual, guessed.) —Preceding unsigned comment added by (talk) 17:12, 7 September 2008 (UTC)

I think the point is that the Kalman filter is an optimal linear filter if the noises are Gaussian (normally distributed). That is true (and very important point) that they do not however need to be Gaussian for Kalman filter to still work. I think the only required assumption for the data is that the noise average is zero (white noise). Assumptions section (with citations) that would state clearly what the Kalman filter usage requires would definitely make the article more understandable. I hope there is somebody with depth enough knowledge about the subject to add one! Yebbey (talk) 14:28, 3 December 2010 (UTC)

Inferring backwards?

A Kalman model will use today's observation to estimate today's state. What do you use when you want to use today's observation to improve your estimate of yesterday's state? — ciphergoth 09:59, 30 July 2006 (UTC)

Well one could simply run the Kalman model in reverse, though if the dynamics aren't reversible you might have to modify your choice of state space and dynamics so you can invert the matrix that evolves the system forward in time (I think this should always be doable by adding 'dummy' state variables). Then you can just apply the Kalman model starting from the last time point and evolving it backwards in time.
Now if you mean to ask something more involved, namely how can you use both past and future information together (using all the info together) to improve estimates the question is much harder. One might try to use the Kalman method both in forward and reverse together, for instance instead of the real measurements use the output of the forward Kalman method as input into running it into reverse. However, I think it very likely this sort of technique will not work but I don't really know. In fact my motivation for answer this question was curiousity about whether this sort of use is possible. Logicnazi 22:36, 10 August 2006 (UTC)
There are definitely well-understood techniques that use both past and future information together. I don't think they're based on the sort of Kalman variants you suggest. I just don't know how to look for them because I don't know what they're called. — ciphergoth 10:49, 11 August 2006 (UTC)
Estimation using both past and future information is "interpolation". There is a 1960 paper by Kalman at that has some references. Jrvz 20:56, 31 August 2006 (UTC)
"Smoothing" has been utillized extensively in actual applications, and software is currently in use at DoD test ranges based in the technique. I see that smoothing is mentioned in the "examples" paragraph -- it might be enough to include references from the 70's and 80's that developed the algorithms involved. The ones I am aware of came from Bierman, who used the "information filter" formulation for the forward pass, and then performed a backward pass for the smoothing. I'll try to locate the references for possible inclusion.
It occurs to me that the other thing done back then by Bierman and others was to apply factorization techniques to reduce the ranges of numerical values being manipulated (since manipulation was of square roots of quantities instead of the quantities themselves). It might be worthwhile to also include a brief discussion, and references, related to this. paul.wilfong at

Peter Swerling

Peter Swerling was a radar engineer who is most famous for the "Swerling Models" and had many contributions to the field of electrical and electronic engineering. The fact that he discovered the Kalman filter (and published it) before Kalman is simply an interesting side note in his life. Why "Peter Swerling" gets re-directed to this page on the Kalman filter is beyond me. He should have a page of his own with a biography and so on.

recursive link

The link

entitled "Peter Swerling" leads back to this Kalman Filter Page! Carrionluggage 21:42, 1 August 2006 (UTC)

Data Fusion - Combining Information from Related Observations

I've been asked to design a Kalman Filter where we can observe several states of the process (some of which have relationships) and to use the Kalman filter to combine related observations to get a better estimate of each.

Some texts I've been reading seem to indicate instead of making a prediction and measurement and using these to form the best estimate, two measurements are combined to form the best estimate (of one of the measurements). In examples, the two measurements seem to be usually a state and its derivative.

I find all of this quite confusing - but if it's a technique used in Kalman filtering, perhaps it needs mentioning?

Let's say, for example, we can measure where represent the speed in 3 dimensions and is the total velocity - i.e.

I guess the rates of change of these variables are also observable. How could the Kalman filter be used here (where functions to make predictions of the next step are unknown)...

Something else that would really be terrific would be an worked example of an ekf with a nonlinear system - many people seem to have difficulty understanding this (myself included, and I've been reading about them for over a year now!). --Ultimâ 20:28, 16 September 2006 (UTC)

The sensor fusion process in the 'update' section of the article can be used to blend together multiple sensors. The simulation is just another sensor. blending order doesn't matter. That's not clear in the article but the sensor fusion used here has properties identical to addition, (eventually I'll update the article to explain why)..

'State' is anything that has memory (and is worth the trouble to model). Objects generally 'remember' their velocity (derivative of position) so where your text measures the position and velocity it's probably just making one measurement of the state vector.

Unless an object remembers it's acceleration from step to step and you feel like modeling the acceleration, it should probably just be left in the noise model.
Sukisuki (talk) 11:55, 8 January 2010 (UTC)

Relationship between Digital and Kalman Filters

The Wikipedia article for Digital Filter has a reference to the Kalman Filter article. Neither has a discussion about the relationship between them, and I know from experience that it would be very helpful to include a brief discussion about the relationship. I have added such, and hope it is acceptable.

paul.wilong at

equation typo

I changed the (innovation (or residual) covariance) equation as it looked as if the R's and P's got switched. If someone who is better versed in state space modeling could double check that this is a correct change I would appreciate it. I'm still a newbie at this state space stuff. Much appreciated.

--(Reply)-- Perhaps some prankster came along and switched the variables. Equation

looks the same as it was last March. Ultimâ 11:57, 14 November 2006 (UTC)

Kalman filter implementation

Can be a good improvement to add an implementation part to the article. Can be usefull have some indication on the practical right algoritm to implement a robust, against ill conditioning, implementation of KF or EKF. Particularly, wich is the most indicated matrix decomposition to use for that. Thanks Mauro Oliverone 19:19, 3 April 2007 (UTC)oliverone.

Unscented Kalman Filter Questions

I'm new to UKF. I read the section on it, but I'm having a hard time understanding with the notation. Take, for example, this step in the UKF prediction:

Here's where I'm stuck. is a function of , the state vector. Let's call its dimension . On the other hand, the sigma points, , have dimension (if I understand everything correctly). So I don't quite understand how to evaluate . Is it really (to borrow MATLAB notation)?

The same thing applies when evaluating in the UKF update.

Equation Fix Suggestion

I think unnecessary confusion is caused by using F in both the linear and non-linear Kalman filter equations. 'A' should be used for the first instance as the state transition matrix (this also means amending the diagram). In the second instance, F is the Jacobian of the state transition matrix, the state matrix and the output matrix- ie. it should not be referred to as the state transition matrix. --Ultimâ (talk) 10:57, 19 May 2008 (UTC)

Really, the whole page should be made to use the same notation as the linear-quadratic-Gaussian control article. —Preceding unsigned comment added by Dave.rudolf (talkcontribs) 18:14, 24 June 2008 (UTC)

Missing term in UKF equations?

I think that there is a missing term in this equation:-

Comparing it with Box 3.1 in Julier's original paper ("A New Extension of the Kalman Filter to Nonlinear Systems"), there is a missing which should be added on to this. I think that it should read:-

-- (talk) 06:51, 18 June 2008 (UTC)

I think the notion of dynamic system model equation is wrong

I think the formular is wrong.

I have found many books talk about this equation. such as <<kalman filtering, Theroy and practics using Matlab, second edition>> by G,S,Grewal. Another book is <<Optimal State Estimation>> by D.Simon. And the pdf tutorials in

They all use the linear equation

So, I think it's necessory to change the equations to eliminate ambiguity. Thank you --Ollydbg (talk) 02:51, 29 July 2008 (UTC)

A. Gelb's 'Applied Optimal Estimation' and others (e.g. Trawny's use the formula
where G is the noise matrix.
They also make a clear distinction between continuous time matrices (which they call F, L and G), and their discrete time counterparts, which they denote with Greek letters , and .
Our F is their state transition matrix and is the solution of the differential equation
with .
Our B is their control matrix and determined by
Our G is their noise matrix and is calculated as


The noise covariance matrix is calculated as


It should also be noted, that when going from continuous time covariances to discrete time covariances , factors of have to be introduced. Trawny's paper has examples and cites ::R.O. Allen and D.H. Chang, "Performance Testing of the Systron Donner Quartz Gyro (...)", JPL, Tech. Rep. ENGINEERING MEMORANDUM EM #343-1297, 1993
but I don't have access to this paper. —Preceding unsigned comment added by (talk) 11:42, 7 February 2010 (UTC)

I agree that xk is dependent on uk-1 and wk-1, not on uk and wk. This is in accordance with what I was taught by Howard Rosenbrock, who worked closely with Kalman and with what I wrote in my own dissertation.
PS - I took the liberty of indenting the previous editor's comments. Martinvl (talk) 09:04, 8 June 2011 (UTC)
Kalman filter model2.svg
Is this about how you would do it? I previously made some comments against making this change, but I now think it is a good idea. I think it is more intuitive, and also more consistent with the current way the EKF is described. Does anyone think it would be a bad idea to change this? --Headlessplatter (talk) 21:28, 8 June 2011 (UTC)

Error with diagram?

Model underlying the Kalman filter. Circles are vectors, squares are matrices, and stars represent Gaussian noise with the associated covariance matrix at the lower right.

Does this diagram imply x(k) and x(k+1) are the same? —Preceding unsigned comment added by (talk) 21:50, 29 January 2009 (UTC)

The fact that it passes through F is enough to tell me that they are different. Ideally, x should be subscripted with k-1 and k respectively, however. -- (talk) 21:26, 27 February 2009 (UTC)

This diagram implies that F, B, and H are all hidden. In fact, these three functions must be known to the user because they are part of the update equations. This diagram needs to be redrawn to indicate that only x, R, v, Q, and w are hidden.-- (talk) 22:01, 25 February 2009 (UTC)

Actually, R and Q should be visible too. They are specified by the user as parameters, although several techniques exist to estimate them. Only x, v, and w should really be shown in the hidden area. -- (talk) 23:28, 2 March 2009 (UTC)

can someone explain the differences between the model equation in the "Underlying dynamic system model" section and the prediction equation in "The Kalman filter" section. B[k] -> B[k-1] and u[k] -> u[k-1] and process noise term w[k] is dropped. Does the diagram need to be changed by moving B matrix and u vector to the k-1 region? Thanks. —Preceding unsigned comment added by (talk) 17:29, 31 March 2009 (UTC)

Model underlying the Kalman filter. Squares represent matrices, ellipses represent multivariate normal distributions (with the mean and covariance matrix enclosed), unenclosed values are vectors.

Here's a candidate diagram that I made to replace that old one. My diagram expresses more information, it more accurately represents the Kalman filter, and I think it addresses the problems mentioned here. --Headlessplatter (talk) 18:44, 7 April 2009 (UTC)

v and w should also be subscripted, because they represent different values in each time frame.-- (talk) 18:59, 7 April 2009 (UTC)
Okay, I fixed that. --Headlessplatter (talk) 17:17, 8 April 2009 (UTC)
It's been nearly two months without comment, so I went ahead and updated that diagram. --Headlessplatter (talk) 16:59, 5 June 2009 (UTC)

There is another area in the diagram with which I am not happy - and which links up to earlier comments. I think that the input uk-1 and the martix B should feed into the oval containing Pk and xk, not the oval containing Pk-1 and xk-1. After all, this is part of the predictor section of the algorithm, not the corrector part. Martinvl (talk) 13:55, 4 March 2010 (UTC)

This really seems to be a matter of deciding when "time" is incremented. If I understand your complaint, you would like time to advance after u fees into B, but before B influences P and x. The easiest way to implement this change would be to slide the shaded time-sections to the right a little bit. Unfortunately, the formula in the "Predict" section seems to agree with the diagram. That is, x_k is a function of u_k, not u_{k-1}. I am also somewhat confident that the scholarly literature on the subject agree with these formulas. So, I think the convention is consistent with the way the diagram shows the advancement of time, and there is too much momentum behind this convention to change it now. (It's like electricity flowing from - to +. It's too established to fix.)--Headlessplatter (talk) 22:28, 16 December 2010 (UTC)
Basic concept of Kalman filtering.svg

I added another diagram which tries to give a higher level view on the subject, explain the different variables etc. -- Petteri Aimonen (talk) 18:20, 25 November 2011 (UTC)

Rejection of data

I had the delusion that the original Kalman filter (Kalman (1960)) included a method for culling (or rejecting, or abandoning, or filtering) outliers. Where did I get this from? Albmont (talk) 21:44, 10 February 2009 (UTC)

The original Kalman filter does culling in an analog sort of way, in that an input that tends to be an outlier tends to get a high covariance and a low weight so the value ends up impacting the results by a small amount. But to do this in an "on vs. off" sort of way by "using vs. omitting" an input from the Kalman filter needs to be done by logic that is outside the Kalman filter. A large spike, pulse, or step in a measurement is not handled well by the analog-like response of the original Kalman filter, so this would need to be handled by separate logic.
In Wikipedia, the articles Data validation and Bounds checking are not oriented to data validation of real-time measurements or signals. The article Robust statistics describes methods of dealing with outliers, some of which could be adapted to real-time measurements or signals. But I couldn't find a Wikipedia article that addressed the issue of culling (or rejecting, or abandoning, or filtering) outliers in real-time measurements or signals. Based on my experience with real-time data acquisition systems, I included below a list of methods to flag data as questionable or invalid. Using a Kalman filter with robust handling of outliers would fall in the category of "Propagation through calculated values (complex)". Simple smoothing (e.g., exponential filter) would fall in the category of "Propagation through calculated values (simple)".
While I was working at a nuclear power plant, a co-worker tested the use of a Kalman filter to calculate data, in this case combine the 4 redundant sensors measuring the power output of the reactor into a single value. The goal was to compare the Kalman filter with simpler forms of sensor fusion (e.g., taking the maximum valid value). The results were that most of the time the Kalman filter produced improved outputs, but that there were occasional anomalous situations that disturbed the output of the Kalman filter and made it worse than simpler forms of sensor fusion. I think the problem in this application was exactly the same as your question, what is the proper way to deal with outliers in data. The outliers need to be flagged as questionable or invalid data, and the calculation handle these flagged values appropriately.
These methods can be used with real-time data acquisition systems to flag data as questionable or invalid:
    • Sensor range error - the measuring and/or data acquisition system reported a value that is out of range high or low, which can be used to flag data as invalid
    • No value - the measuring and/or data acquisition system provided no value within the expected timeframe, which can be used to flag data as questionable or invalid depending on how stale the data is
    • Change in value - a measurement or signal whose value changed by an amount that is not reasonable (high or low) since the previous value can be used to flag data as questionable or invalid
    • Statistical validation - as a byproduct of curve fitting to smooth data and calculate rates of change, the goodness of fit can be used to flag data as questionable or invalid
    • Redundant sensors - two or more sensors measuring the same value can be compared in order to flag data as questionable or invalid
    • Redundant data paths - the same sensor that gets to the system by more than one path can be compared in order to flag data as questionable or invalid
    • Cross-validation of sensors - two or more sensors that are not measuring the same thing, but can be related by a calculation, can be compared in order to flag data as questionable or invalid
    • Propagation through calculated values (simple) - questionable or invalid data used as the input to a calculation is used to flag the results of the calculation as questionable or invalid; if any input is questionable then the output is flagged as questionable, and if any input is invalid then the output is flagged as invalid (invalid overrides questionable)
    • Propagation through calculated values (complex) - a calculation or algorithm uses a set of tests, logic, and voting to use or omit values; the tests and logic looks at the questionable and invalid flags of the input data and the intermediate results in the calculation to determine the questionable or invalid flags of the outputs
Obankston (talk) 20:11, 19 February 2010 (UTC)

Example is inconsistent?

The example suggests that the B matrix is to be ignored: "There are no controls on the truck, so we ignore Bk and uk" However, a couple of lines down we see:

From Newton's laws of motion we conclude that


I find this very confusing. Since the is used, it suggests that the acceleration is meassured, and used to update the predicted state of the truck. Maybe the description of what is going on needs to be updated?—Preceding unsigned comment added by (talk) 09:35, 16 December 2009 (UTC)

I agree that statement does not make any sense. Other than that, the example seems fine though. --Drizzd (talk) 19:14, 18 December 2009 (UTC)

@above: I would much rather use uk and Xk|k-1 to keep with the same notation. — Preceding unsigned comment added by Milte001 (talkcontribs) 10:58, 1 April 2011 (UTC)

Also, in the example the matrix H is given as a 1x2 matrix. That doesn't make sense. In order to be used in the residual covariance equation Sk = HkPk|k-1Hk^T+Rk, wouldn't it need to be 2x2 like P? Also, doesn't that make R from the example need to be a 2x2 matrix as well? --Madprog (talk) 18:35, 5 January 2010 (UTC)

@Madprog: Yes, it does: works perfectly.

The older version, which had G instead of B was (half) right and should be reinstated, but only after the noise matrix G has been properly introduced (see 'notion of dynamic system' above).

But due to the confusion of continuous and discrete time variables there is still a problem, because the formula for the covariance Q turns out to be wrong.

To do it properly, start with the continuous system (I use lowercase letters for the continuous time matrices instead of uppercase letters as in the sources cited above to avoid confusion):

The dynamics of position x, velocity and acceleration of the truck are described by the linear differential equation

where the dot denotes the derivative with respect to time,

is the system matrix,

is the noise matrix and is a random acceleration with characteristics and .

Integration of the homogenous differential equation yields the state transition matrix

and, assuming during the integration interval,

is the discrete time inhomogenous driving force.

Together, this comprises the discrete time system model

The discrete time noise covariance matrix is calculated as


Assuming during the integration interval, we get


With this resolves to

where is the position error and is the velocity error.

At each time step, a noisy measurement of the true position of the truck is made. Let us suppose the measurement noise vk is also normally distributed, with mean 0 and standard deviation σz.



—Preceding unsigned comment added by (talk) 15:26, 7 February 2010 (UTC)

Your statement of the problem differs from the one in the article: you have a constantly-changing acceleration, accelerations at any two different times being uncorrelated, whereas the article has acceleration that's constant on each of the discrete time intervals (hence nonzero correlation when tau < delta-t). I think Q in the article is correct, given how the problem is stated there. I've fixed the B-versus-G confusion in the article and left Q as it was. Gareth McCaughan (talk) 14:06, 1 March 2010 (UTC)

This argument seems wrong to me: the derivation of the discrete time model does assume a constant acceleration during the integration interval. When you consider the discrete system, you can and do not make statements about the autocorrelation during the integration interval. But even if we assume for a moment, that there is a different function at work, the formula in the article still does not work out, because if we set

with some unknown , and take the derivative on both sides, it follows, that there exists no , which can solve this matrix equation.

(Former talk with a different IP)

All that means is that the discrete model is not a discretization of a well-behaved continuous-time system. Specifically, the process noise is not happening in continuous time. Instead, it affects the acceleration which will then apply (constant, no further noise) over each time interval. So Q(t) consists of delta-function spikes and it makes no sense to take its derivative. Yes, this is a very artificial setup, but it has to be if it's going to give an opportunity to explain the discrete-time Kalman filter using a real-world-looking example. Gareth McCaughan (talk) 11:29, 9 March 2010 (UTC)

This argument makes no sense to me. Q(t) is never differentiated, only integrals involving Q(t):

Looking at the (2, 2) component of the matrix equation, . This integral is well defined, however nasty Q(t) is, and it follows, that , a perfectly well-behaved function (ideed, it looks like the covariance of a random walk). With Q(t) now known, the other components of the matrix equation can be checked: and . Ooops!

First: you're right that my comment about differentiating Q(t) was nonsense. Sorry about that. However, about the main point at issue, I still think you're simply wrong. You are assuming that , which assumes that the (co)variances are additive, which is true only when they are independent, which is not true for the system specified in the example because the process noise is forced to be constant on our time intervals of length . In other words, the system described in the article is of the right form for the application of a discrete-time Kalman filter but is *not* of the right form for the application of a continuous-time Kalman filter, because the process noise isn't white. That would be a problem if the example purported to show how the continuous-time Kalman filter works, but it doesn't; it's an example of a discrete-time Kalman filter, and it seems to me to serve that purpose just fine. Gareth McCaughan (talk) 21:25, 21 March 2010 (UTC)

Huh? The formulas above aren't about a continuous-time Kalman filter. It is about a dynamic system, which is initially described as a linear differential equation system, excited by Gaussian random noise. Using time-honored methods, the discrete time system dynamics is determined, as well as a Covariance matrix. Everything works out perfectly, with one exception: the results for cannot be reconciled. Now, as nobody has shown an error in the deduction itself, I have to ask:

Where does come from, and are you sure that this formula is correct?

[Note: the two paragraphs above were unsigned; they were posted from; I assume it's the same person as before.]

is the formula for the covariance matrix of a multivariate Gaussian obtained by taking a standard Gaussian distribution (in this case ) and applying a linear transformation (in this case G) to it.

As you rightly say, there doesn't appear to be an error in the derivation in the article. As I've pointed out, however, there is an error in your derivation: it assumes that , which is not true in this case because the at different t are correlated: it's the acceleration, not merely the distribution of the acceleration, that's constant on each time interval. Gareth McCaughan (talk) 23:06, 23 March 2010 (UTC)

After some deliberation I think I now understand enough of the matter to be able to reconcile the differences.

Executive summary: The integral formulas above are correct and valid for arbitrary values of , but need some clarification (the introduction of was a bad idea). The expression is an approximation, which is valid only for "small" . The discrepancies arise, when the approximation approaches the limit of its validity.

For the explanation, let's reiterate the relevant formulas and the results for the example system and write instead of , as I should have from the beginning:

is a spectral density. The corresponding discrete time covariance is defined via


which is equal to the result of the article, as long as terms of can be neglected.

Where do the integrals come from? To answer that, the coarse-grained time scale with period , on which the discrete Kalman-Filter operates, is overlayed by a fine-grained timescale with period . With each tick on the tiny timescale our system is excited by a randomly distributed acceleration, which propagates according the system equation until the next tick of the coarse-grained timescale.

is the sum of all those tiny accelerations, and can be treated, as if there was only one random acceleration event at each tick of the coarse-grained timescale, provided it is multiplied with a suitable matrix . But in general, quoting from A. Gelb's book, only the product is uniquely defined, not the individual terms and . This is even more relevant for , which should be treated as some fancy glyph, not a matrix expression.

The right-hand side of is derived as the expectation of an uncorrelated random sequence :

via the integral definition of (for details, see chapter 3.6 of Gelb's book).

The const-ness of during the integration interval means, that the distribution of the individual random accelerations on the tiny timescale does not change during the integration.

The term , on the other hand, could maybe be construed as the covariance of a random walk (which is what you get, if you integrate random Gaussians) while sweeping across the integration interval, but that's for somebody else to figure out.

Anyway, thanks for the learning experience! Peter Pöschl ( (talk) 10:48, 2 April 2010 (UTC)) and other anonymous comments above.

"Technical" template

This article is listed in Category:Articles needing expert attention from January 2010 and Category:Wikipedia articles that are too technical. I created a more accessible lead and shuffled some sections and paragraphs so that the information is approximately sorted in order of ascending technical content. Is the article ready to remove the {{technical|date=January 2010}} template? Obankston (talk) 23:50, 16 February 2010 (UTC)

The Too Technical template was removed in [2]. Obankston (talk) 02:44, 17 March 2010 (UTC)

Are the UKF function dimensions correct?

It looks to me like there's a problem with the dimensions of the functions in the UKF discussion.

Under the EKF section, the argument to the non-linear function, f, is a vector of dimension n. In the UKF discussion, the argument to the function f is passed a vector of larger dimension, because the vector is augmented. What do I misunderstand? (talk) 23:10, 15 March 2010 (UTC)

Unit mismatch in the Kalman gain derivation section

How can it be meaningful to speak of minimizing the trace of the covariance matrix? The diagonals in the covariance matrix will normally have mixed units, i.e., m2 and (m/s)2 for a simple constant velocity two-state Kalman filter (the Kalman analogue of the alpha beta filter). And adding those give no meaning from a physical unit point-of-view. Thus, one would have to consider a normalization of the state vector with the physical units, e.g., [m] and [m/s] for this trace minimization to be meaningful.

But that brings me to another doubt, I can normalize lenghts with nm or light years which will yield covariance matrices with very different numerical values in the diagonal depending on the normalization unit. As a matter of fact, it seems unclear for me, why it is the difference between the true and filtered target state vectors, which should be minimized and not the difference normalized with some typical scale of the problem in each dimension of the target state vector.

I was thinking that it should be possible to make a maximum likelihood estimate of the Kalman gain matrix instead, but I just cannot figure out quite how to do that. In the ML-view, the optimal Kalman gain should drop out as a matrix weight in a weighted mean between multivariate normally distributed random vectors, which gives the maximum likelihood matrix weighted addition between a new and the a priori covariance mapped to the observation space via the observation matrix.

The multivariates have this nice unit-less trace term in the log-likelihood

Which seems more concise to work with for me.

I would love to see such a ML derivation instead in that section, as I think it could be "neater". If only I could figure out how to do it the MLE way, I would do it myself......

--Slaunger (talk) 07:45, 29 June 2010 (UTC)

Your concerns are answered in chapter 4 of A. Gelb's book: They start with a weighted scalar sum of the diagonal elements of the covariance matrix, so the cost function to be minimized is

where is an arbitrary positive semidefinite matrix. The important fact is, that the minimum of is independent of , therefore can be chosen to be the unit matrix, which results in .

(Peter Pöschl) (talk) 21:29, 5 July 2010 (UTC)

Hi Peter, thank you for that explanation, which sheds some light on my question. I suppose S is the inverse covariance matrix (the precision matrix) in the expression from Gelb? Still, replacing a matrix with different physical units in the individual elements with a unitless identity matrix is really ugly IMO. I have done some internal training of collegues in the Kalman filter recently where we went through the derivation as it is stated here, and two commented that the derivation of the optimal Kalman gain did not make sense due to the mixed physical units - i.e., they are left with a feeling of a dummy derivation whose sole purpose is to arrive at the correct result without being a stringent derivation. I suppose though that I could use the argument still and replace with a diagonal matrix, which just have values of one times time natural unit of the element in the matrix, such that the cost function is without physicial dimensions. --Slaunger (talk) 07:58, 16 July 2010 (UTC)

Kalman gain derivation not obvious

I am especially confused as to how

is expanded to the monster below it. Some step-by-step seems in order here. — Preceding unsigned comment added by Dave.rudolf (talkcontribs) 00:53, 12 January 2011 (UTC)

Erroneous Modified Bryson-Frazier formula ?

I think the formula

really should be

This fits better with the reference, and seems to give the correct numerical results for me... — Preceding unsigned comment added by (talk) 00:02, 22 June 2011 (UTC)

Hybrid Kalman filter

The section Hybrid Kalman filter has broken math code. Does anyone know how to fix it? Chris857 (talk) 17:51, 5 October 2011 (UTC)

Intro sentence

Its purpose is to use measurements observed over time, containing noise (random variations) and other inaccuracies, and produce values that tend to be closer to the true values of the measurements and their associated calculated values.

I'm sure a lot of collective work went into this sentence, but I'm afraid it is still not very clear. The second half sentence has too many "values". What is a "true value of a measurement", and what "associated calculated values" are we talking about? It seems the author wants to avoid talking about estimating the system's state, which is fine at this early stage, but "state of the system" and "calculated values associated to measurements" are not the same thing, and the latter isn't any simpler than the former. Further, the sentence states that the produced values are "closer to" something, raising the question: closer than what?

Let's start brainstorming. How about:

Its purpose is to use a series of measurements observed over time, containing noise (random variations) and other inaccuracies, and produce estimates that tend to be closer to the true unknown values than those that would be based on a single measurement alone.

AxelBoldt (talk) 03:23, 29 December 2011 (UTC)

Your suggestion is better than the original, though both are ok. It would be good to get a comment from someone *not* familiar with the subject.
Petteri Aimonen (talk) 04:35, 29 December 2011 (UTC)
Ok, I'll put it in; if somebody complains we can always switch back. Cheers, AxelBoldt (talk) 21:03, 2 January 2012 (UTC)

I've rewritten the introduction in an attempt to clarify and unify it. After this edit it read like two introductions mashed together. However, I'm a student just learning about the filter and a newcomer to this page, so it would be great if someone with more experience could review it. The introduction also should be significantly shorter - perhaps some of the explanatory paragraphs should be merged into the main article. StevenBell (talk) 18:23, 20 February 2012 (UTC)

Example Animation

Click for animation

I made this animation. Does anyone have some constructive criticism? Maybe it should go in the example application section? Is this the right way to link a custom thumbnail to an animation? Thanks for the feedback. --Suki907 (talk) 22:44, 4 March 2012 (UTC)

Nice animation. It could be useful to write the corresponding model in the page next to the animation. Custom thumbnail is not possible, unless such a feature has been added just recently.
--Petteri Aimonen (talk) 08:11, 5 March 2012 (UTC)
That should be easy, it's just a few 2x2 matrices, I'll also put the parameters into the comments page for the file, with the next version. --Suki907 (talk) 14:13, 11 March 2012 (UTC)
Great work! But: You have spelled measurement wrong. And also, i would add the system function to the pic, just in case. And, it would be much easier to understand, if the scales were fix. I've watched it several times now and I'm still not sure if it's right or if the star that represents the actual system state jumps nonsensically. -- (talk) 23:38, 5 March 2012 (UTC)
Ok. So in the updated version I fixed the spelling, and stopped shearing the axes with the update step (you're not the first person to find that this makes it less clear).
The reason I have the star jumping is because of the process noise. The kalman filter models the system update as a (linear transform) + (gaussian process noise). In this example the system I'm filtering meets the expectations of the kalman filter -> the system is stepped forward with the shear, then gaussian process noise is added.
The same noise covariance is also added to the filter state (in a well calibrate filter, the noise value used in the filter should match the noise in the actual system).
So on the "add noise" frame you see two red elipses, one adding the noise to the filter, the other adding the same noise to the system state (that's when the star gets re-sampled). I'm prety sure it's correct, it just might not be the best way to show it.
Thanks. --Suki907 (talk) 14:13, 11 March 2012 (UTC)

Fixed spelling of Title above. Михал Орела 17:55, 13 March 2012 (UTC)

References for Applications

The applications section is unconvincing (yes, I know the Kalman filter is important) because there are no references. Further, the applications link to other wiki pages that often do not mention the Kalman filter. This makes the present page look a little bit hyped up. Jfgrcar (talk) 06:46, 20 April 2012 (UTC)

x or x dot ?

In some of the sections, the state update equation is x = Fx + Bu + w and in other sections it is x' = Fx + Bu + w where x' is x(dot) , presumably the time derivative.

Can these both be right ?

I can see two variations, ("x hat") and ("x dot"). The hat barely underlines the fact that is an estimate. It is quite common to leave the hat off, though of course it would be good to have it used everywhere in the article or not at all
On the other hand, x dot means time derivative like you guessed. It is only used in the continuous-time case, and this difference has a logical reason: in discrete case, we update at one moment , in continuous case the x is changing continuously with the instantaneous derivative . The F matrix for these cases is different, and can be found as a solution to the differential equation . The solution is usually approximated with Taylor series as (if I recall the equation correctly), where T is the timestep.
Usually continuous time Kalman filters are not very nice computationally, but they are more closely related to the actual physics and can be easily converted to discrete version for calculations.
-- Petteri Aimonen (talk) 15:31, 4 June 2012 (UTC)
OK, so if you have your previous estimate x(k-1) and then you calculate your estimated rate of change x(dot), then you calculate your a priori estimate of the current position x(k) which would be something like xhat(k) = x(k-1) + ( delta_t x x(dot ) ) and then you would adjust this with all the other matrix stuff. But I don't see this equation, or anything like this, anywhere. I just see, calculate x(dot), and then you got your xhat(k) from no where. Is this bit just assumed to occur, because it is obvious ?? Or is it something completely different? Eregli bob (talk) 06:26, 13 January 2013 (UTC)

Organization and rating

I recently read through this whole article top to bottom. There's good information here. I've found some problems and I don't believe this article quite merits a B rating. --Kvng (talk) 13:02, 1 July 2012 (UTC)

  • There is unhelpful repetition of material
  • Some of the special topics are not well integrated into the article
  • Some of the math needs supporting copy

Original research in Kalman’s original filter derivation

The whole section "Kalman’s original filter derivation" appears to be an original research of user Hikenstuff aka Mr. J. W. Bell. Significant portions of the referenced publication (single author, only published last month!) were copied directly to the Wikipedia article. I'd suggest removing this section altogether, and add it back only if the results are verified in future by other researchers. — Preceding unsigned comment added by (talk) 14:57, 30 November 2012 (UTC)

The research articles cited in that section are self-published, they are not reviewed nor do they seem serious. They are all from the same author. There is no mention of the "MFOE" anywhere, except in Mr. Bell's (self-published) articles. Given the relevance of the article, I think this section should be removed as soon as possible. (talk) 23:41, 2 December 2012 (UTC)

I concur with the previous criticisms. I have published original research in first-rate journals on topics related to Kalman filtering. Delete this section ASAP. At best, it represents the personal opinion of Mr. Bell alone and, at worst, the conclusions are simply incorrect. Must we provide a proof of the latter before deleting this section? Butala (talk) 23:43, 14 December 2012 (UTC)

I just deleted the section. Original research does not belong in the article, especially if the claims are dubious at best. Butala (talk) 23:43, 14 December 2012 (UTC)

Hats over variables

Why does x deserve a hat and P not? — Preceding unsigned comment added by (talk) 12:11, 20 May 2013 (UTC)


Notations should be consistent. Kkddkkdd (talk) 15:39, 15 June 2013 (UTC)

information filter

I think the formulas of information filter should be improved to allow non-invertible Q.

My suggestions are following:

in the section of RTS smoother should be also corrected. Kkddkkdd (talk) 09:43, 14 July 2013 (UTC)

If we can use , my other suggestions for are following:

Kkddkkdd (talk) 14:45, 21 July 2013 (UTC)


I think the subtitle of "Invariants" should be "Unbiasedness." Kkddkkdd (talk) 11:42, 20 July 2013 (UTC)

Noise Matrix

The noise Matrix , which is used in the section of "Example application, technical", should be introduced in the section of "Underlying dynamic system model". And the dynamic model should be:

Kkddkkdd (talk) 12:21, 21 July 2013 (UTC)