Recently there has been much interest in providing increased video possibilities, for instance 3-D images on 3-D image displays. It is believed that 3-D imaging will be, after color imaging, the next great innovation in imaging. We are now at the advent of introduction of auto-stereoscopic displays for the consumer market.
Basically, a three dimensional impression can be created by using stereo pairs, i.e. two slightly different images directed at the two eyes of the viewer.
Whatever type of display is used, the 3-D image information has to be provided to the display device. This is usually done in the form of a video data signal comprising digital data, often comprising data for a left and a right view or for a number of views, when multiple views are generated.
Another example of increased video possibilities is providing a video data signal capable of providing high frequency video, for instance video with double the standard frame display frequency.
Yet another example is providing a video data signal of enhanced resolution.
Because of the massive amounts of data inherent in digital imaging, the processing and/or the transmission of digital image signals form significant problems. In many circumstances the available processing power and/or transmission capacity is insufficient to process and/or transmit high quality video data signals. More particularly, each digital image frame is a still image formed from an array of pixels. This problem exists for all video but is increased for 3D video imaging, and the same increase in problem occurs when a video data signal of double frequency is to be generated or a video data signal of enhanced resolution is to be generated.
The amounts of raw digital information are usually massive requiring large processing power and/or or large transmission rates which are not always available. Various compression methods have been proposed to reduce the amount of data to be transmitted, including for instance MPEG-2, MPEG-4 and H.263.
The known compression methods have originally been set up for standard 2D images.
If for instance 3D information is generated at the acquisition side, this information needs to be transmitted and in order to have a low extra overhead in terms of bit rate, compression of 3D information is required. Preferably the compression (or encoding) of the 3D information is performed in such a manner that compression of 3D information can be implemented using existing compression standards with only relatively small adjustments. When the video data signal is enhanced in the sense that it comprises information on the double frequency signal or enhanced resolution the same applies.
Furthermore the improved video signal is preferably backwards compatible, i.e. a conventional standard video apparatus should preferably be able to display a “good” video image from the improved video signal. For instance the 3D stereo signal is preferably 2D backwards compatible, i.e. a conventional 2D apparatus should preferably be able to display a “good” 2D image from the 3D signal. A high frequency 100 Hz video data signal should be able to be displayed on a standard 50 Hz video apparatus even if the apparatus is itself not capable of displaying 100 Hz signals. Likewise a video data signal of enhanced resolution (HDTV, High Definition TV) should be able to be displayed on a standard TV apparatus.
Simply compressing a stereo image as two separate leads to a large increase in bit rate. Encoding separately the left (L) and right (R) views of a stereo pair practically leads to doubling the bit-rate compared to a mono system (one single view) if one wants to guarantee the same quality. Thus such a method, although ensuring that a 2D device can display an image, requires doubling of the bit-rate.
The amount of data increases even more when use is made of a multiview system wherein more than two views are generated.
The same applies when a video data signal is enhanced by including information on higher frequency video data signals. Double the frequency would double the data. Increasing the resolution will create the same problem.
A better method, in regards to coding efficiency, is to jointly compress the two stereo (left and right) or more views or jointly compress high frequency and low frequency video data signals or jointly compress low resolution and high resolution video data signals. This solution, for a left and right frame, typically leads when two views are jointly compressed to using 50% more bandwidth than the single-view case (to be compared to ˜100% more bandwidth in the case of separate view coding). This can be achieved using conventional 2D video compressing encoders by interleaving Left and Right frames from each stereo view to form a “fake” 2D sequence. At the retriever side, the 2D frames are de-interleaved and each view is retrieved and displayed. For instance the 2 views (L and R) can be interleaved as frame pictures before entering a video encoder.
However, although using standard techniques for instance for stereo video can be more (1.5*gain) efficiently compressed jointly than compressing the separate views and the resulting bit-stream could be displayed on a suitable 3D device, the inventors have realized that the result is one single bit-stream which cannot be displayed on a normal 2D system with good results. When the single interleaved bit-stream reaches a conventional 2D receiver (with a 2D decoder and a 2D screen), the displayed video sequence would look ugly showing visible imperfections as it results from the interleaving of a stereo one. This method is thus not backwards compatible. The same holds for multiview signals or other improved video data signals which are jointly compressed.
It is thus an object of the invention to provide a method for encoding enhanced image data at the transmission side which does offer backward compatibility while keeping the amount of data within the encoded data in bounds. Preferably the coding efficiency is large. Also, preferably, the method is compatible with existing encoding standards.
It is a further object to provide an improved encoder for encoding a video data signal and a video data signal.