1. Field of the Invention
This invention relates generally to video compression and, more particularly, to a method and apparatus for compressing multi-view video.
2. Description of the Related Art
Multi-view video generates a single representation of a scene from multiple views of that scene. Each view consists of a number of still images captured one after another by a camera. The views are obtained from separate cameras positioned at distinct view points. The cameras that capture the views are typically positioned closely to one another. This proximity ensures a high degree of similarity between corresponding frames captured by different cameras. The term "corresponding frames" in this context identifies frames from different views whose images were captured at the same time.
Perhaps the most common example of multi-view video is stereoscopic video. To create stereoscopic video, two horizontally separated cameras capture still images of the scene. When the stereoscopic video is displayed, the information in the video from each view is directed to the appropriate eye of the viewer. This results in added realism through perceived depth perception, adds realism, and improves the viewer's scene understanding. Some implementations of multi-view video incorporate information from three, four, or more cameras to amplify these benefits.
One significant problem associated with multi-view video relative to traditional, single-view video, is the additional transmission bandwidth it requires. "Bandwidth" is the amount of data that can be transmitted in a fixed amount of time. Cameras typically capture a still image in analog form and then convert it to digital form. In the conversion, the still image in the frame broken down into "pixels," or picture elements. Each pixel in the frame is then converted to digital form and represented by one or more bits.
The number of pixels in any particular frame will depend on the resolution of the video standard being employed using the common VGA standard, there might be anywhere from 307,200 pixels per frame down to 64,000 pixels per frame. The single-view video captured by an individual camera will typically comprise a series of frames, perhaps as many as sixty per second. Multiplied by the number of cameras, the multi-view video system must handle a lot of data.
Most multi-view video systems therefore use some form of video compression. "Compression" is a technique by which a large amount of information can be conveyed by transmitting a fewer number of bits representative of the larger amount of information. For example, there typically will be only small variations in the content between successive frames captured by the same camera. This system could therefore transmit the initial frame and then subsequently transmit only the differences between the successive frames. The high similarity between corresponding frames captured by separate cameras in a multi-view video system may also be highly similar, and the digital information extracted therefrom highly redundant. Again, this redundancy can be exploited to compress the amount of data the multi-view video system must process.
Current compression techniques for multi-view video identify the video from one of the cameras as an independent view and a second video from a second camera as a dependent view. The independent video is encoded. The dependent view video is then encoded using what are called "disparity estimation" and "disparity compensation" techniques based on the encoded independent view video data.
Current multi-view video compression techniques use a "block based" disparity estimation approach. Typically, the blocks are sixteen pixels by sixteen pixels such that each of the pixels in the sixteen by sixteen pixel block are assigned the same disparity estimation. Thus, these techniques do not exploit the temporal redundancy of the disparity vector fields, result in poorer coding efficiency, and results in the inability to accurately perform viewpoint interpolation.
The present invention is directed to overcoming, or least reducing the effects of, one or more of the problems set forth above.