During aerial video and/or image registration to a known three-dimensional (3D) scene, an aircraft vehicle is flown over a location and a live video feed or still images are captured of an environment. By determining the pose of the camera (e.g., where the camera was pointed) at the time the images where captured, the individual frames of the video feed or the still images can then be projected onto a 3D terrain within a virtual world to depict a visualization of the ground as imaged from the perspective of the aircraft vehicle.
A camera pose typically defines the position and orientation of a camera within an environment. In some instances, the camera position can be determined through a hardware camera tracking system in which a magnetic and/or optical tracker is placed on the camera and the camera's position in space is subsequently tracked. However, the determined position is only an estimate of the actual location of the camera. In particular, the position and orientation data may be erroneous due to low frequency sampling, interferences, and the like. As a result, during the video and image registration, the images are only projected in a nearby vicinity of where the actual objects (e.g., buildings, roads, cars, and the like) were located, and the projected image may not correlate with the reference 3D scene.
In other instances, the camera pose can be computed from the live video feed and/or images taken by the camera using known locations of objects that the camera is imaging along with the two-dimensional (2D) location of the objects in the captured image. Various algorithms can compute both intrinsic properties (e.g., focal length of a lens, image size, radial distortion, and other optical properties) and extrinsic properties (e.g., position and orientation) of the camera within six degrees of freedom of the camera in space, given adequate correlation points. Traditionally, techniques for determining the position and orientation of a camera, at the time an image was captured, require a minimum number of three-dimensional (3D) and two-dimensional (2D) correlations to compute a complete camera pose. For example, an algorithm for determining a camera pose may typically require at least six (6) to eleven (11) correlations for performing calculations to derive an accurate position of the camera. However, many times, the required number of points of correlation may be unavailable, resulting in the algorithm being unable to determine the position of the camera.
Therefore, it may be desirable to have a system and method that provides optimal estimation of a camera pose given limited information about how the 3D world relates to the 2D image captured by the camera.