In its simplest form, the tracking of an object or target in a sequence of video images can be described as a system's ability to produce a series of position/location estimates in sequential images, given a target in an image, an initial position/location of the target, and a sequence of subsequent images. In constructing a video tracking system, one may want to consider several issues including the features of the object to be tracked, how the object can be identified (e.g., color, shape, appearance), what are the expected viewing conditions in which the object will appear and be tracked, whether position estimates of the object will be produced in real time, and whether the system will handle situations in which an object temporarily disappears from view. Moreover, additional levels of complexity are added to a tracking system when multiple objects are present in a scene.
To address these issues, many tracking systems include a prediction mechanism. Such a mechanism helps to define the search space in subsequent frames, and if the object becomes occluded, the prediction mechanism can help relocate the object.
In many video processing systems, Kalman filters have been used for object tracking in data signals such as video, radar, and process control signals. A Kalman filter, which belongs to a class of Bayesian filtering techniques, uses a state-space model of the problem which is represented by two basic equations—a state transition equation and a measurement update equation. The state transition equation models how the state of the system evolves through time and the measurement update equation models how the measurement of the system relates to the underlying state. In Kalman filters, the state transition and measurement update equations are constrained to have linear transfer functions and Gaussian noise models. An algorithm known in the art as the condensation algorithm uses stochastic sampling to overcome these constraints. The condensation algorithm describes a sample-based representation of a recursive Bayesian filter. It uses ‘factored sampling’, in which the probability distribution of possible interpretations are represented by a randomly generated set.
For the tracking itself of objects in video data, as previously mentioned, such features as color, shape, and appearance of an object can be used. In one technique, a video motion detection (VMD) algorithm detects blobs (i.e. shapes) that are moving in a sequence of image frames. The video motion detection algorithm does this by learning the static background of the scene. One or more blobs from the VMD are then tracked through the frames of the video sequence. Another known technique involves the manual identification of an object and tracking thereof using the color(s) of the object. Using only motion blobs would have problems with large objects that move slowly in the field of vision, or in instances where there is start and stop motion. In these cases, the object often gets split into multiple blobs and this can cause problems. In the case of purely color based tracking, initialization of tracks is difficult. The initialization is usually manual or through a method of object segmentation.
Many of the known video tracking techniques still have problems tracking objects in real time, under a changing set of viewing conditions, and when objects disappear and reappear in a scene. Therefore, the video tracking art is in need of a system that can adequately address these issues.