For various systems, visual tracking is one of the key features necessary for analyzing objects in dynamic environments. The visual tracking has been intensively researched during the last decade, leading to applications, for example, in the field of surveillance, collision avoidance and trajectory evaluation.
As described in detail in Aaron Edsinger and Charles C. Kemp, “Toward Robot Learning of Tool Manipulation from Human Demonstration,” Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, Mass. (specifically with reference to FIGS. 1 and 10), a robot may utilize the result of visual tracking information forwarded to a visual servoing unit. The visual servoing unit controls the position and orientation of the tracked object in the field of view of a camera by controlling actuators that adapts the position and orientation of the camera.
Visual servoing may involve the use of one or more cameras and a computer vision system to control the position of a robotic end-effector such as manipulators relative to an object being tracked.
One of the problems in tracking systems is the need for accurate internal models of the objects being tracked. For example, in a traffic scene, three dimensional models of cars represented as either volumetric or surface models must be used to indicate the exact physical dimensions of the cars. The models are then fitted to match sensor signal received by a camera. Alternatively, tracking systems locate the objects based on specific information about the appearance of an object such as color. In a general case, however, objects to be tracked are not determined in advance and therefore accurate 3D model or other specific information is unavailable. In this case, tracking systems have to rely on estimates of objects including a three dimensional position and a three dimensional velocity based on a combination of several distinct cues and measurements.
Visual tracking is the capability to visually identify and follow a real-world object over time based on a signal provided by cameras despite changes in its dynamical parameters (for example, position and velocity) and its two dimensional appearance captured by the camera. The appearance as captured by the camera is two dimensional due to the perspective projection of the three dimensional real-world objects onto a two dimensional screen. The appearance of the object may change considerably due to different surface effects attributable to variable external conditions such as external light spectrum, repositioning of light sources, reflectance and shading effects as well as internal properties such as object deformations. Alternatively, the appearance of the object may change simply because of the rotation of the object or linear movement of the object.