The invention relates generally to multiview or stereoscopic video systems, and more particularly to encoding and decoding techniques for multiview video.
Multiview video is becoming more and more popular. Multiview video is also referred to as stereoscopic video or 3D video. With multiview video, multiple views, such as left eye and right eye views are captured at the same time for the same scene from different angles by, for example, different cameras. There can be more than two views, for example, as well. Given the large data volumes used for 3D distribution, compression and decompression of the stereoscopic video is becoming more important. Video coding standards are known, however, to have deficiencies. For example, the H.264 standard (multiview coding MVC extension) does not specify how an encoder should perform, e.g. how many sets of encoding systems are used, it only explains that the syntax format the output which is generated by the encoder should conform to, in order to ensure that the output data is decodable by an MVC decoder. In order to compress and decompress the stereo video using the existing H.264 encoder and a decoder without a system upgrade to support newer video coding standards (such as, for example, the multiview video coding (MVC) which is a new extension of the H.264/AVC spec under “Annex H Multiview video coding”) and to avoid using two sets of encoding and decoding systems for simulcasting the left and right views, it is common to send or transmit multiview video by decimating (downsizing) each view picture (either right eye frame or left eye frame) by half and then packing two views into a single frame. This is known as “frame compatible coding” and is described in greater detail below. The typical packing scheme includes a top and bottom packing format, a left and right packing format or a line interleaved packing format, for example. The packed frames are then treated as normal frames in the encoding process by a conventional encoder. This decimating, encoding and packing can be done by one or more programmed processors, dedicated logic or other image processing circuitry.
Frame compatible format coding allows stereo video to be encoded in the conventional video encoder without any requirement to upgrade an encoding system with higher capability. Accordingly, it has been considerably deployed in stereo 3D video services. As shown in FIG. 1, frame compatible format packs a pair of downscaled stereo views into one single frame. Generally, a left view and right view image are each downscaled to half the vertical resolution in one example. The two downscaled view images are packed together to form a single frame with full resolution and then sent to a video encoder. The motion estimation block in the video encoder takes much of the overall encoding time. In a conventional motion estimation process, an exhaustive search is performed in a reference frame for each macroblock in the current frame to find the best motion vector which results in the lowest rate distortion cost. In order to encode such images, the exhaustive macroblock search can be done on every frame to encode each frame of multiview information. However, such operation can be very time consuming.
An alternative methodology may be to take the motion vector from the top/left half frame as the motion vector for the co-located MB in the bottom/right half frame. This may save more processing cycles for the motion estimation process in a video encoder but taking the motion vector directly from one view and using it for the other view may introduce visual quality degradation.
Accordingly, it would be desirable to have an improved encoding and motion estimation operation for multiview video.