Tracking of objects in video streams is an important feature of intelligent video analysis systems. Conventional systems first detect objects in frames of a video, and then relate objects in different frames to one another. The relating of objects is called tracking. One example of object tracking is the situation in which a system identifies people crossing the field of view of a video camera, and determines the tracks of these people in order to count how many people cross the noted field of view in a defined time period.
Typically, tracking methods predict a position of the object to be tracked in a frame at a time t based on a history of known positions of the object and previous predictions of the position of the object before the time t.
There are various types of tracking, including point-based tracking in which an object is represented by one or more points, and kernel based tracking in which an object is represented by a template.                Several factors impact on the quality of tracking:        Input quality (object detection results). If objects are not detected correctly, this affects the feature values on which tracking is based, e.g., the centre of gravity. This is especially troublesome if the detection error is not consistent over succeeding frames.        Content complexity. This includes full/partial occlusion or change in appearance of the tracked object, a change in direction or velocity, a change in the depth, and so on.        Object interaction. Objects may merge (e.g., somebody picking up a suitcase) or split (e.g., somebody stepping out of a car).        Availability of computational resources.        
A large number of tracking methods have been proposed. Computationally inexpensive methods can be incorporated as part of a camera. However, the functionality of such methods is restricted to simple situations (e.g., one person walking slowly in the field of view) and is typically not able to cope with more complex situations that occur in practice (e.g. several people passing and occluding one another).
Tracking methods that are relatively robust to input quality, content complexity and object interaction require computational resources that are not practical to embed in devices such as cameras. Therefore, processing usually takes place on a separate computer system, such as a Personal Computer (PC). Such a setup puts additional constraints on a tracking system. For example, considerable bandwidth is generally needed to transmit the video stream to the separate computer system.