From the photography aficionado type digital cameras to the high-end computer vision systems, digital imaging is a fast growing technology that is becoming an integral part of everyday life. In its most basic definition, a digital image is a computer readable representation of an image of a subject taken by a digital imaging device, e.g. a camera, video camera, or the like. A computer readable representation, or digital image, typically includes a number of picture elements, or pixels, arranged in an image file or document according to one of many available graphic formats. For example, some graphic file formats include, without limitation, bitmap, Graphics Interchange Format (GIF), Joint Photographic Experts Group (JPEG) format, and the like. A subject is anything that can be imaged, i.e., photographed, videotaped, or the like. In general, a subject may be an object or part thereof, a person or a part thereof, a scenic view, an animal, or the like. An image of a subject typically comprises viewing conditions that, to some extent, make the image unique. In imaging, viewing conditions typically refer to the relative orientation between the camera and the object (i.e., the pose), and the external illumination under which the images are acquired.
Motion video is generally captured as a series of still images, or frames. Of particular interest and utility is the ability to track the location of an object of interest within the series of successive frames comprising a motion video, a concept generally referred to as visual tracking. Example applications include without limitation intelligence gathering, whereby the location and description of the target object over time are of interest, and robotics, whereby a machine may be directed to perform certain actions based upon the perceived location of a target object.
The non-stationary aspects of the target object and the background within the overall image challenge the design of visual tracking methods. Conventional algorithms may be able to track objects, either previously viewed or not, over short spans of time and in well-controlled environments. However, these algorithms usually fail to observe the object's motion or eventually encounter significant drifts, either due to drastic change in the object's appearance or large lighting variation. Although such situations have been ameliorated, most visual tracking algorithms typically operate on the premise that the target object does not change drastically over time. Consequently, these algorithms initially build static models of the target object, without accounting for changes in appearance, e.g., large variation in pose or facial expression, or in the surroundings, e.g., lighting variation. Such an approach is prone to instability.
From the above, there is a need for an improved, robust method for visual tracking that learns and adapts to intrinsic changes, e.g., in pose or shape variation of the target object itself, as well as to extrinsic changes, e.g., in camera orientation, illumination or background.