The present invention concerns video stabilization and, in particular, apparatus and a method for adaptively merging a stabilized image with existing background information to hide blank areas in the stabilized image.
Image stabilization is desirable in many applications including news reporting, the production of motion pictures, video surveillance, and motion compensated image coding. In all of these applications, it is desirable to remove unwanted jitter between successive frames of source video. Producers of television news programs want to stabilize video from hand held cameras before presenting it to their viewers. Video from surveillance cameras mounted on swaying or rotating platforms, or on moving vehicles is desirably stabilized so that it can be analyzed by a computer before being presented to human observers.
One method for obtaining a stable image is to mount the camera on a mechanically stabilized platform. Such a platform typically employs gyroscopes to sense platform rotation, and motors to compensate for that rotation. Stabilized platforms tend to be relatively expensive and, because they are based on feedback control systems, do not compensate well for rapid movement of the camera.
Electronic stabilization with imager motion sensors can be used to compensate for camera motion which can not be corrected by the mechanically stabilized platform. Electronic stabilization systems sense platform motion which is not corrected by the electro-mechanical feedback system. The sensed residual motion is converted to transformation parameters which are then used to warp the current image to remove the residual motion, producing a stabilized output image. Electronic stabilization systems may be used without an electro-mechanical stabilization platform to compensate for imager motion. In systems of this type, camera motion may be sensed by mechanical motion sensors such as gyroscopes and accelerometers or they may be sensed directly from the image data by analyzing and correlating predetermined components of successive image frames.
Video Stabilization provides many benefits for processing video that is acquired from an unstable camera. Stabilizing the video provides human viewers with a much better idea of what is happening in the scene and allows detection of details that may go unnoticed if they are masked by image motion. Because many applications which need video stabilization also need to operate in real-time, it is important that the video stabilization operations performed for these applications also operate in real-time. Exemplary electronic and electro-mechanical video stabilization methods are described in U.S. Pat. No. 5,629,988, entitled SYSTEM AND METHOD FOR ELECTRONIC IMAGE STABILIZATION by Burt et al, which is incorporated herein by reference for its teaching on video stabilization techniques.
One method of electronic video stabilization uses information from previous video frames to align the current video frame with a predetermined display coordinate system. To perform this operation, a video processor desirably determines the orientation of the current image with respect to the coordinate system and a transformation of the current image which will bring it into alignment with the coordinate system. Once the correct alignment is determined, the processor applies the determined transformation to "warp" the current frame into alignment with the coordinate system, aligning objects in the current frame to objects in the previous frames. An exemplary warping process is disclosed in the above-referenced U.S. patent. The aligned frame is then displayed on a monitor for human viewers or used for further processing.
Video stabilization for human viewers can have an undesirable side-effect: in which blank regions appear on the edge of the display. These blank regions occur when the camera is subject to substantial motion, causing it to produce an image which is displaced by a relatively large distance from the previous image. The blank regions represent areas where the video processor has no current information about what should be displayed because there is no information for these areas in the current frame after it has been aligned to match previous video frames,. This artifact is described with reference to FIGS. 1, 2 and 3. At time T0 the camera provides the video frame shown in FIG. 1 in which the sailboat 100 is displayed in the center of the screen. This image is aligned with the predetermined coordinate system relative to the sailboat 100. After time T0 but before time T1, the camera moves so that it provides the image frame shown in FIG. 2 at time T1. In this frame, the sailboat 100 is not at the center of the frame but has been shifted substantially to the right. To align the image received at time T1 to the coordinate system of the image received at time T0, the video processor determines that it is necessary to shift ("warp") the frame shown in FIG. 2 by some number, N, pixels to the left.
Thus, when the frame shown in FIG. 2 is warped for display, an area N pixels wide from the right side of frame is not displayed, while an N pixel wide area 120 on the left edge of the frame is blank because no information from the frame shown in FIG. 1 is available to fill that area. The area of the display with current video, centered on the sailboat 100, is stable but there is a distracting blank area 120 on the right side of the image. Although the blank area is shown on the right side of the image, it may appear on any side or on all sides of the display and may change rapidly depending on the type, amount and direction of the motion to which the camera is subject.
Depending on the type of signal being displayed, some amount of motion can be stabilized without creating blank regions if the signal includes valid video data which is not visible on normal displays. For example, in standard NTSC video, there is a significant amount of valid video data which forms a border around the visible region of the display. This data represents an overscan portion of the image. Television set designers typically incorporate this overscan into the displayed image to compensate for variations in assembly and for the local strength of the earth's magnetic field which tend to magnify, reduce or shift the displayed image. By designing the television receiver to display the data over an area greater than the visible area of the screen, these shifts in the image may be accommodated without displaying any artifacts that would be noticed by a viewer. This overscan data is visible when special display monitors are operated in "underscan" mode. This overscan data can be used by video processors to compensate for blank video areas such as that shown in FIG. 3. With reference to FIG. 1, if, for example, the overscan on the television receiver caused only the area indicated by the dashed line 110 to be displayed, then the image shift that occurs between time T0 and time T1 may be accommodated by shifting the image to the left, eliminating the overscan on the right side of the image. If there are N pixels to the right of the normally displayed region, then, when the image is shifted to the right by N pixels, N pixels from the overscan region are shifted into the visible region of the display and there is no blank region.
There are, of course, limits to the usefulness of the overscan data. When large image shifts are necessary to compensate for large amounts of motion, then the imaging apparatus data from the overscan region may reduce the size of the blank region, but it will not be able to compensate for all magnitudes of motion. If there are P pixels of overscan data on each side of the visible image, then motion shifts of more than P pixels will still cause blank regions on the display.
In addition, because the overscan region of the video signal represents data that is not seen by the viewer, it is desirable to keep this region as small as possible. Thus, the data in the overscan region can not be used to compensate for large image shifts.
Previously, others have tried to use electronic zoom as a method for blank removal. This method artificially increases the size of the overscan region by zooming the displayed image by a small factor. Portions of the video data that would otherwise be in the visible region of the display now become part of the overscan region. This greatly increases the size of the overscan region and thus provides a much larger buffer which is available for filling in blank regions. There are, however, serious problems with this technique. First, while larger magnitudes camera motion can be handled without displaying blank regions, there are still limits to the how much motion can be compensated. Second, by zooming the video image, the field of view is reduced and image quality is degraded. This is a major defect when top quality video is required.
The above mentioned techniques all rely on manipulating the current video frame to try and remove blank regions from the display. The invention described herein is a method for using information from prior video frames to fill areas of the display that would otherwise be blank, without sacrificing image quality or field of view.