Panoramic presentations are popular for the immersive experience that they offer viewers. Museums, recreational parks, and theaters have been playing documentaries and entertainment content in panoramic views for quite some time now. This content has been projected onto large spherical or semi-spherical domes wherein viewers experience motion sensation and engulf themselves in a virtual reality. Other applications of interest for panoramic video are in the domain of low scale television production. Therein, a high-resolution high-quality panoramic video is constructed as an intermediate video feed from which a region-of-interest may be cut to create the final program. Such a system for live panorama construction not only allows for efficient production of low-scale events, but also provides flexibility in generating more than one program from a given panorama as well as producing playbacks of unpredictable happenings in the scene that otherwise may have been missed by traditional television coverage practices.
Broadcast quality cameras are limited by the field-of-view they cover. Using wide-angle lenses produces spatial distortion and image blur. Therefore, it is common in the art to combine images from multiple views to form a high-quality wide-angle panoramic view. Seamlessly tiling images captured by one palming camera may provide a panoramic view of the scene covering up to 360 degrees. Stitching together images that were captured by different cameras, though, not only requires accurate pairwise image alignment, but also requires radiometric and spatial corrections to compensate for differences in the cameras' exposure-times and lens' characteristics. Combining images from several cameras is also complicated by parallax artifacts since physical cameras cannot share the same projection-center. Most existing systems for panoramic video stitching include cameras that are placed at predefined spatial locations (e.g. on a grid). Typically, this regular camera placement is a design constraint that limits the flexibility and efficiency of system setting at the field.
Prior technologies focused on stitching images captured by the same camera. The main challenge in this setting is to seamlessly align the images onto each other. Image alignment (registration) is a known in the art process that is applicable to myriad domains such as cartography, super-resolution, and biomedical image registration. It deals with finding the mathematical transformation (mapping) that maps a location in one image to its corresponding location in a second image, where corresponding image locations represent image-projections of the same physical point at a scene. Different transformation models may be used to spatially deform (map) one image to match the other. For example, global parametric models (e.g. affine or perspective) may be used to compensate for differences in view angles and focal lengths. Non-linear models may be needed to compensate for local deformations using, for example, optical flow based methods.
Two main approaches for image registration are common in the art: feature-based and image-based (direct) alignments. In a feature-based alignment the transformation is resolved based on corresponding pairs of features. Discriminating features based on local image characteristics may be extracted from the images that are to be aligned. For example, scale- and orientation-invariant features such as SIFT or ASIFT are commonly used in the art. In an image-based alignment, overlapping pixels from the two images to be aligned are compared directly. Being compared directly, the steps of extracting features and finding feature pair correspondences are not required in an image-based alignment. Nevertheless, image-based registration is limited in the range it can cover. Therefore, image-based registration is often used as a refinement step after a feature-based registration was employed. Both feature-based and image-based registration methods require a similarity metric or a distance metric to drive the optimization process that is targeted at finding the optimal transformation. The metric used can, for example, be the Euclidean distance between corresponding features or any similarity (or distance) between image characteristics measured within a neighborhood (patch) of the corresponding feature locations.
Constructing a panorama out of multiple cameras requires preprocessing the respective cameras' images to compensate for the differences in the cameras' intrinsic parameters. To compensate for camera-specific lens' characteristics, the lens distortion is typically modeled as a radial distortion. In a radial distortion model the captured image is spatially distorted as a function of the radial distance from the center, and may be compensated for by employing a low order polynomial model. Differences in the cameras' exposure-times may also be compensated for. Camera exposure-time affects the radiometric attributes of the captured image. Radiometric differences between two corresponding images may impair the accuracy of their alignment. Hence, methods in the art for correcting radiometric discrepancies (color balancing) are typically employed across the images captured by multiple cameras before combining them into a panoramic image.
Another challenge in stitching images captured by multiple cameras is the inevitable difference in the cameras' projection-centers. The differences in the cameras' projection-centers lead to known in the art parallax artifacts that stein from discrepancies in the image-projections of corresponding structures. When attempting to align these images, visual distortions such as blurring, ghosting, and discontinuities result. These parallax distortions may be minimized using miniature camera arrays where cameras are positioned close to each other. The minimal distance between a pair of cameras, though, is limited by the size of these cameras. These tend to be relatively large for high-quality broadcast cameras. Various stitching methods that are designed to reduce parallax artifacts are known in the art. A known approach is to hide the parallax artifacts employing seam optimization via graph cuts. Seams based methods attempt to “hide” the parallax distortions by cutting through well aligned image regions rather than removing the parallax artifacts between the views. Though effective for some applications, finding well aligned regions may be difficult for video with high motion or high cluttered content. Another approach is to warp the images to locally compensate for parallax distortions. For example, a common method is to warp the overlapping regions of the aligned images using an optical flow based method. This approach is limited by the quality of the estimated optical flow that is susceptible to the input video frames' temporal synchronization accuracy, video images quality, and the distance metric (warping error) in use. A distance metric that efficiently captures the structural parallax distortions is required for effective parallax removal.
In addition to providing seamless image alignment, effective techniques for combining corresponding image-frames from multiple video streams need to account for the temporal coherency of the output panoramic video. Otherwise, inconsistency in successive panoramic video frames may create perceptible distortions, resulting in a panoramic video that is not on a par with broadcast quality programming. In addition to high quality demands, the panorama construction method should be able to process multiple high-resolution image-frames in real-time, allowing for live panoramic video computation. Especially, in panoramic video from which a live program is cut, and where zooming-in at various region-of-interests within the panoramic view is required, high image quality is a main concern.
Common panoramic video systems include the process of camera calibration and 3D reconstruction, followed by 3D scene projection onto a desired image plane. In practice, high quality and temporally stable 3D reconstruction is a difficult task with high complexity. Many commercially available panoramic video systems are based on pre-calibrated and miniaturized camera arrays. While the requirement for camera calibration complicates the system's initialization procedure, constraining the cameras' physical size (to reduce parallax) limits the application domain as professional high-end cameras are physically large. Panoramic video construction systems and methods that are tractable and effective in reducing parallax artifacts without constraining the cameras' size and array structure are needed.