This invention relates generally to real-time camera tracking systems, and more specifically to a single camera tracking system with a controllable pan and tilt mechanism for providing real time, narrow field-of-view, high resolution and on target images.
The use of pan/tilt camera systems with computer vision (CV) techniques is a burgeoning field. Pan/tilt camera systems offer a broader field of view (FOV) for capturing dynamic systems and CV techniques provide a means for automatically detecting objects of interest.
Applying CV and PTZ camera systems for event detection, surveillance and similar functions is the focus of much recent research. There is a continued need to integrate and develop robust, real-time tracking methods using CV techniques with PTZ cameras into other areas such as measurement and navigation. Real-time algorithms are preferred for tracking, but current, although robust, methods of tracking algorithms, despite increasing speed from use of parallel processing techniques, are generally still not sufficiently computationally efficient for truly real-time frame rates.
Prior art single camera systems for tracking moving objects are limited to using a wide field of view, low resolution image in order to successfully track a moving object in real time within the computational speed limitations of currently available commercial camera and computer components.
Tracking, as opposed to using a very wide field of view, low resolution, image, allows for higher resolution images of an object. For example, a tracking system can have the object span 600 pixels of a one megapixel image (1,000 pixels on an edge), but if the object moves 10 times its width, a 9 mega-pixel camera (3,000 pixels on an edge) would only have 300 pixels across the object resulting in half the effective resolution. The larger the span of the object's movement, the more the resolution of a static camera drops. Resolution of a tracking camera would not change based on the span of the object's movement.
Another reason for obtaining higher resolution images of moving objects is for improved post-processing. Lower resolution images degrade post processing performance.
In the early days of photography, a camera needed to be static to obtain clear images. As shutter speeds increased, moving video recorders became feasible and were operated by trained professionals. Training was required so that final videos were high quality in both aesthetic (such as the evocative element of the image) and technical terms (such as framing of a subject and minimal jitter). Current technology ameliorates some of the concerns of an untrained operator, but the general need for live operators is a concern for many applications.
Computer-based tracking uses computer algorithms to automatically process images from a camera to extract useful data. In the simplest sense, a wide-field of view camera can process videos to produce a video tight around an object of interest. This can be done in real-time (while the video is being taken) or by post-processing (any time after the video is captured). If the system is real-time, then data extracted can be used to interact with the object, such as using a pan-tilt unit to track an object of interest.
Many algorithms are based on using a static camera to direct a pan-tilt unit so a separate camera can track an object, or are based on matching a template, describing the object of interest, to a portion of the image. Basing tracking on another camera can result in poor framing of the object in the image because there is no direct feedback to indicate the quality of the images from the tracking camera. If either camera is misaligned, then the system can have a catastrophic failure. Template matching requires a training phase to identify the appearance of an object of interest. If the object changes appearance, such as putting on a ski mask for a facial template; the algorithm may not recognize the appearance as a match.
There are general purpose methods that can locate generic moving objects from a moving camera. These, however, use observed motion rather than commanded motion. When a camera moves, stationary objects will appear to move across the image. Many methods use sequential images to determine an image transform which creates the best match between the images. The disparities between the transformed images represent motion or changes of an object. These methods require that the background is feature rich (as opposed to an untextured wall), and take up the majority of the image, detracting from the objective of having the object fill a plurality of the image. In addition, errors in determining camera motion propagate into error in determining object motion.
Another method uses an inertial sensor to measure rotation, but while the inertial sensor captures a plurality of the motion, it measures acceleration, resulting in inaccuracies and sensor drift. It also performs image stabilization instead of moving object recognition and tracking.
It is clear, therefore, that there is a need for a single camera tracking system that can provide real time, narrow field-of-view, high resolution and on target images.