Successful interpretation and analysis of video from a surveillance camera, such as an aerially positioned camera, requires the ability to determine the geographic location of one or more objects viewed by the video.
The ability to accurately determine the geographic location of objects, depends in part on a knowledge and consideration of the camera's pose (i.e., the camera's geographic location, orientation, and field of view). Identifying the correct geographic location, referred to as “geo-location,” provides for a more accurate interpretation of the video, by properly placing the video in context relative to other geographic information, such as, for example, maps, reference imagery, and other data obtained from real-time sensors.
Typically, video from an aerial surveillance camera is accompanied by metadata containing periodic reports of measurements from which the camera's pose may be estimated. These measurements usually include the current location and orientation of the aircraft carrying the camera, as determined by instruments such as global positioning system (GPS) receivers, inertial measurement units, compasses, tilt sensors, and the like. If the camera is mounted to the aircraft using camera positioning equipment (e.g., a gimbal) so that it can be pointed somewhat independently of the aircraft, then the metadata also usually includes periodic reports of the current angles of the gimbal's axes of rotation, as determined, for example, by angle sensors. And if the camera has a variable field of view or focal length, as provided, for example, by a zoom lens or lens turret, then the metadata usually includes periodic reports of the current field of view, focal length, and/or choice of lens. This metadata typically accompanies the video as it is transmitted and/or recorded, and is available to video analysis and display systems to aid interpretation of the video.
The metadata delivered by current aerial surveillance systems often suffer from two shortcomings. First, the metadata has insufficient accuracy. That is, errors in the metadata do not allow the video to be geo-located with the accuracy needed to support operations such as viewing the video in the context of other geographic information, fusing video information with information from other sensors, and modeling the dynamics of moving objects tracked in the video. The second problem is that measurements reported in the metadata are usually repeated or updated at a lower rate than that of video frames. For some aerial surveillance systems with which we have experience, measurements are repeated as infrequently as once every two to three seconds. However, even when measurements are repeated every other video frame, the arrival of video frames without accompanying measurements means that the information needed to geo-locate those video frames must be extrapolated or interpolated from measurements taken at other times. Because the aerial platform is usually subject to buffeting and vibration, the missing values cannot be extrapolated or interpolated with sufficient accuracy to support operations such as those listed above.
One approach to improving metadata accuracy and timeliness is to geo-register video frames, as described in U.S. Pat. No. 6,597,818, issued July 2003, and in the article titled “Adding precision to airborne video with model based registration” by John A. Van Workhum and Steven G. Blask, published in Second International Workshop on Digital and Computational Video (IEEE Computer Society, February 2001). However, since geo-registration is computationally expensive, it is generally performed on just a subset of video frames (e.g., one frame each second). In addition, geo-registration may fail on some frames for lack of suitable landmarks or features, leading to inaccurate measurements. Furthermore, conventional geo-registration techniques require the availability of appropriate reference imagery.
For some video processing applications, a Kalman Filter is used to estimate linear motion in a target scene. In other applications, an Extended Kalman Filter (EKF) modifies the conventional Kalman Filter by linearizing all nonlinear models (i.e., process and measurement models) to provide motion estimates for images and scenes including nonlinear orientation data. The Extended Kalman filter (EKF) is a set of mathematical equations which uses an underlying process model to make an estimate of the current state of a system and then corrects the estimate using any available sensor measurements. Unfortunately, the EKF has two important potential drawbacks. First, the derivation of the Jacobian matrices, the linear approximators to the nonlinear functions, may be complex, causing implementation difficulties. Second, these linearizations may lead to filter instability if the update cycle (i.e., timestep intervals) are not sufficiently small.
Accordingly, there is a need for a method and system for improving the accuracy and timeliness of video metadata.