Tracking the position of image structures or objects in video is important for video understanding, especially in the domain of surveillance where objects need to be followed over time. In general, there are two major components of a visual tracking system: Target Representation, and Localization. One way to represent a target object is to use a contour or contour segments of the object, and associate the same contour(s) in consecutive video frames. Accurate tracking of a contour is an essential building block for robust object tracking.
One approach to contour tracking is the Conditional Density Propagation (Condensation) algorithm. Condensation tracks a B-spline representation of a contour with a particle filter. First, in an initialisation step, the Condensation algorithm approximates the input contour with a B-spline curve and initialises the tracking system with a number of copies of the same curve particle. Next, during the tracking stage, a three-step process is employed. Firstly, a prediction step is used to hypothesize future states of each particle based on a dynamical model of system evolution. Secondly, observation likelihood scores of the particles are computed by drawing fixed length normal lines that are centred on the measurement points along the spline curve, and then measuring the proximity of each measurement point to the strongest and nearest edge in the image along the normal lines. The prediction and observation steps are also known as the Dynamics and Observation. Thirdly, in the resampling stage, a fixed number of particles are selected for repeating the same tracking process in the next frame.
Condensation works effectively on contours which enclose the whole object. However, when the tracking algorithm is applied to tracking contour segments of a non-rigid object, e.g. an outline of a person's head and shoulders or part of the person's leg, Condensation easily confuses a tracked contour segment with background structures presented in a cluttered scene.
One way to overcome the problem of confusing tracked contour segment with background structure is to employ an adaptive normal line scanning method at the Observation stage. More specifically, the normal line length is adaptive in two aspects: adaptive line length, and adaptive line centre. The line length is made to grow and shrink based on the pose variance, and the amount of line growth and shrinkage is learnt in an off-line training phase. The line centre is made adaptive by applying a distance transform. There are two downsides to this approach. First, an offline training process is involved which makes the tracking system application specific and dependent on provided training samples. Second, the centres of each normal line and line length on each contour hypothesis are recomputed for each input frame. This is computationally expensive, hence making such technique unsuitable for real-time tracking applications.
What is needed is a method that tracks contours and contour segments, particularly when presented with cluttered background, that is independent of application specific training data and that is practical to use in a real time application.