User operation of video imaging devices, in particular hand-held recording devices, can produce displaced or distorted image data due to small movements of the operator while supporting the imaging device. Displaced and distorted image data, however, is not desired. Accordingly, conventional methods and devices have been employed for stabilization of image data captured by video imaging devices.
For example, one conventional method includes employing one or more motion sensors, such as a gyroscope, to detect motion of the imaging device for correction of image data. These methods require motion sensors, which may increase cost, and may still result in image distortions as the motion sensing arrangements typically employed usually do not detect rotational motion around the optical axis nor translational camera motion. Difficulties with image stabilization may additionally increase when using zoom features of the imaging device, or when imaging scenes with a strongly three-dimensional (3D) nature, i.e. with both nearby and distant objects in view.
Optical stabilization systems may also shift the optics or the image sensor to correct for image shake. The main disadvantages of such systems are their complexity and cost. The available correction range is also limited due to a limited ability to shift the optics and/or image sensor. Further, rotational motion cannot usually be compensated with such systems.
A processor typically operates on image data, as it arrives or after storage in a memory, to digitally manipulate the pixels to correct for motion instabilities of various origins. Pixels are generally shifted in position by an amount calculated from various types of information provided by the camera or extracted from the video data. Processed images may be shifted, combined, warped, or otherwise used to compensate for problems with video quality according to image conditions, a given methodology, and/or user preferences.
Image data captured by rolling shutter type sensors can present unique difficulties. The term “rolling shutter” refers generally to a method of image acquisition in which each frame is recorded not from a snapshot of a single point in time, but rather by scanning across the frame, either vertically or horizontally. Not all parts of the image are recorded at exactly the same time, even though the whole frame is displayed at the same time during playback. Most CMOS sensors employed in the field are rolling shutter type sensors. The advantage of rolling shutters is that the image sensor can continue to gather photons during the acquisition process, thus increasing sensitivity. The disadvantage is that distortions and artifacts can occur, particularly when imaging fast-moving objects or scenes having rapid changes in light level.
When multiple views are available for the video sequence, for example when imagery is recorded by several sensors as in 3D video capture, the process of Digital Video Stabilization (DVS) may be improved. In the prior art, the multi-view video sequence is used to obtain structure-from-motion (i.e., a 3D description of the scene), to estimate 3D camera movement and then to form a single-view output video sequence with a smoothed 3D path. Performing the processes of structure-from-motion and determination of 3D camera movement is computationally very costly for real-time systems, however.
Further, the prior art does not address the problems intrinsic to the process of determination of camera motion, especially for cameras with rolling shutter sensors. These problems include distinguishing local motion from global motion (i.e. objects moving within a visual frame vs. the motion of the entire frame), and distinguishing 3D motion from CMOS/rolling shutter artifacts.
Thus, there is a need in the art for improvements in DVS by better using information obtainable from multiple views. This application describes a solution to these difficulties.