1. Field of the Invention
This invention relates generally to the field of computer vision, and more specifically to a system, method and apparatus for detecting and tracking a selected object in a video sequence.
2. Discussion of the Related Art
Computer vision systems are known in the art. Such systems may track objects through a series of digital frames. However, many of the presently utilized systems track images only in the red-green-blue (xe2x80x9cRGBxe2x80x9d) colorspace. Such systems are poor at tracking objects through frames in which lighting conditions are changing.
Digital images include at least one picture element (xe2x80x9cpixelxe2x80x9d). Pixels are the small discrete elements that together constitute digital images. Each pixel of a digital image may be displayed on a computer monitor, or the like. Each pixel may be classified according to the amount of each of the primary colors of visible lightxe2x80x94red, green and bluexe2x80x94(the xe2x80x9cRGB colorspacexe2x80x9d) that are present in the pixel. If 8 bits of information are used to represent the amount of light for each of the primary colors for each pixel, then with respect to the red component of an RGB image, the brightest red would be represented by the number 255 (in binary, 11111111) and a complete absence of red would be represented by the number 0 (in binary, 00000000). The amounts of green and blue in the pixel are also represented in a similar way.
However, the amounts of red, green and blue in an image represented in the RGB colorspace may change in different lighting conditions. For example, in a digital photograph of a red sweater, the red component of the RGB colorspace might have a level of xe2x80x9c110xe2x80x9d in medium lighting, xe2x80x9c200xe2x80x9d in bright lighting conditions, and xe2x80x9c40xe2x80x9d in dim lighting, even though the sweater has not been alteredxe2x80x94only the lighting has changed. Therefore, since each of the RGB components are influenced by lighting conditions, it is problematic to keep track of a colored object in the RGB colorspace.
Another colorspace is the Hue-Saturation-Value (HSV) colorspace. The HSV colorspace, in constrast to the RGB colorspace, better represents what humans see. In the HSV colorspace, each pixel may be classified according to its Hue, the Saturation of its Hue, and the brightness (Value) in a pixel. Hue represents the wavelength of light present in the pixel. In the HSV colorspace, each of the visible colors of light is represented. Each pixel of an image has a Hue represented by cylindrical coordinates between 0xc2x0 and 359xc2x0. Red is represented by coordinates around 0xc2x0. Yellow is represented by coordinates around 60xc2x0. Blue is represented by coordinates around 240xc2x0. Green is represented by coordinates around 300xc2x0.
Saturation represents the amount of Hue present in a pixel. If Saturation is represented on a scale between 0 and 1, a Saturation of 0.5 for a red Hue would be a medium red. A xe2x80x9cvery redxe2x80x9d pixel would be represented by a Saturation of close to 1. A very red pixel would have so much red that it would, in fact, appear to be glowing red. A pixel with a red Hue that is not very red would be represented by a Saturation close to 0. Hues with Saturations close to zero appear to be mostly gray with only a slight amount of that Hue present.
Value is utilized to represent the amount brightness in the pixel. Value is typically represented on a scale from 0 to 1, with 1 representing the greatest amount of brightness, and 0 representing the least amount of brightness. Pixels with brightness near 0 are very darkxe2x80x94almost black. Pixels near 1 are very brightxe2x80x94almost white. If the Saturation is 0, then Value by itself represents the grayscale.
Object tracking systems in the art are deficient in that they are typically only able to accurately track objects under well-known conditions, such as within a range of illumination and with constraints on the fidelity of the camera.
Many current tracking systems convert a colored image into a binary image, the binary image being an image in which each pixel is represented by a xe2x80x9c1xe2x80x9d or xe2x80x9c0xe2x80x9d. Each xe2x80x9c1xe2x80x9d represents a pixel that might be a part of the object to be tracked. Such systems utilize processes to find the largest connected-object within the binary image, the largest connected-object being determined to be the tracked object. Such algorithms are very time-comsuming and generally inefficiently utilize system resources.