Multidimensional data can be collected by means of many different physical processes, for example: images may be collected by a video camera; by radar systems; by sonar systems; by infrared systems; by astronomical observations of star systems; by medical imaging using x-rays with dynamic image recording, magnetic resonance imaging, ultrasound, or by any other technology capable of generating an image of physical objects. The image data may then be analyzed in order to track targets of interest. Tracking is the recursive estimation of a sequence of states that best explains a sequence of observations. The states are specifications of the configuration of a model which is designed to explain the observations.
As an example, in tracking a human figure in a sequence of video frames a human "stick model" can be used. A line having both length and orientation can be used to represent each major skeletal bone such as lower arm, upper arm, lower leg, upper leg, trunk, head, shoulder girdle, etc. A particular frame of the video can be specified by giving the length, position, and orientation of each of the lines used in the stick model. The "state" is the collection of data required to completely specify the model. The state is used to compute a predicted image in the next video frame, and the recursive estimation process is used to refine the state values by comparing the predicted image with the data gathered by the video camera. As a further example, radar, acoustic, x-ray, etc. data can be used to generate images of the physical objects being observed, and a model of the objects can be used to aid in computation of a predicted image. The state is the set of data required to completely specify the model, for example the location of each aircraft in a radar produced-image for air traffic control purposes.
Modem detectors often return a very large amount of data. For example, a simple video camera produces approximately 30 frames per second (depending on the video protocol) with each frame having approximately 300 pixels horizontally across the image and 200 rows of pixels vertically in the image to yield 60,000 pixels in each image (again the details depending upon the video protocol). It is a very computation intensive process to generate a predicted image for each frame and to compare the predicted image with the actual data in order to refine the state of a model for tracking purposes.
Kalman filter tracking has been successful as a tool for refining the parameters of a model in cases where a probability density function is sufficiently simple. Kalman filters are described by Eli Brookner in the book Tracking and Kalman Filtering Made Easy, published by John Wiley & Sons, Inc., in 1998, all disclosures of which are incorporated herein by reference. However, as data gathered by detectors becomes more complex, and the complex data requires the models to distinguish between ambiguous representations of the data, the simple approach to tracking by Kalman filtering breaks down.
There is needed an improved method for refining the state of a model of objects, where predictions of the model are compared with the large amounts of data produced by modem detectors.