Learning to recognize key objects in video data and then extracting the pixels that compose those objects is a component in content based video processing. Some methods of object detection directly detect each occurrence of an object based on the pixel-wise or block-wise color difference between consecutive frames. Other procedures entail first detecting several occurrences (samples) of the object. A template (usually a frame) is then learned for the object by extracting common characteristics from these acquired samples. For example, some methods use motion feature technique and apply dynamic programming to match the object movement. The extracted object template is then used to scan the whole video to find all occurrences of the object. Each occurrence of the object should contain the same foreground pixels (i.e., object pixels) and different background pixels (i.e., non-object pixels). Thus, foreground pixels may be extracted to represent the object itself. Background pixels may not describe the object and may introduce noise.
Still other methods may be used for extracting the foreground pixels of an object. The object may be highlighted and located at the center of the frame. Thus, only the center and bright pixels are extracted. Other methods may use motion information, assuming pixels that move faster than others are foreground pixels. However, pixels around the margin of the object may be returned. These marginal pixels do not provide accurate information for the object.