With the advancements experienced in digital processing capabilities, increased speeds of microprocessors and increased memory storage capacities, processing of relatively large amounts of video data in digital formats may be improved. Video streams are typically rich sources of information. For example, data according to the NTSC video standard consists of a stream of thirty images or frames per second and individual frames consist of two interlaced fields wherein one contains odd-numbered scan lines and the other contains even-numbered lines. Frames digitized according to a CCIR601 YUV 4:2:2 format yields 720×486×2=699,840 bytes. The digitized video stream rate is 30×699,840, or approximately 21M bytes/second. Because of retrace times, data flow may not be constant at this rate but is typically clocked out at 27M bytes/second line-bursts.
Reconstruction problems of computing a spatial CAD-type scene-model of locations, shapes and orientations of visible surfaces in a scene are posed with the use of cameras panning over a static 3-D scene. One solution has utilized stereo triangulation (e.g., use in surveying and creation of topographic maps). The exemplary procedure includes identifying a feature in two images from differing viewpoints, and measuring the feature's image coordinates in the two images. The internals of the camera may be calibrated by measuring the focal length of the camera lens and geometrical characteristics of the camera's image formation. Externals of the camera may be calibrated by measuring a location and orientation of the second viewpoint relative to a coordinate frame located in the first viewpoint. The triangle consisting of the two viewpoints, and an unknown feature location may be solved given the coordinate locations of the feature in the two images. This can be accomplished by constructing rays from the two viewpoints through image-plane coordinates, and solving for a best 3-D intersection plane.
Alternate strategies for recovering 3-D descriptions of scenes from one or more image can be obtained by utilizing “shape from shading” which allows inference of shape of diffusely reflecting (matte) surfaces by making various assumptions about the distribution and types of light sources present. Contextual assumptions may be made if it can be assumed that a scene contains only diffusely reflective polygonal objects resting on a planar surface illuminated by point source of light at a known position and orientation. This permits processing of an image to extract a line-drawing of surface boundary contours, application of consistency rules and extraction of 3-D locations of other visible vertices. Conventional devices implementing conventional image processing methods are relatively complex and involve significant cost.
Mosaicing can be thought of as a special case of 3-D reconstruction wherein the scene itself can be completely described in two dimensions. Two common examples of this are panning and scanning. Panning refers to an instance when a camera is panned about a single viewpoint. In this case there is no parallax between frames as every object appears from the same viewpoint in every frame. The whole scene can be described by a single spherical image centered at the viewpoint. The acquired frames in a sequence are treated as windows into the sphere projected onto the camera image plane. Scanning refers to an instance when the scene to be recorded is itself a 2D surface. In this case, the camera is translated, usually but not necessarily, parallel to the surface. The acquired frames in a sequence are treated as windows into the surface, projected onto the camera image plane. This situation is common in document scanning.
As described below, aspects of the present invention provide improved panning, scanning and other imaging apparatuses and methodologies which enable robust stitching of video fields into mosaics in an efficient manner.