The present invention is generally directed towards a process for the compression, encoding, storage and transmission of stereoscopic images that is well suited for real time, or semi real time processing.
The efficient transmission, or recording, of stereoscopic images presents a number of problems. Whereas for 2D images a single image, or sequence of images where moving images are presented to the viewer, is presented to both the viewers eyes, for stereoscopic images a different image must be presented to each eye.
The simplest method of providing stereoscopic images is to provide two 2D images to the viewer, that is, a left and right eye image. Ideally these images should be of equal resolution and clarity and, will at worst case, require twice the storage capacity and, for the case where the images are broadcast, twice the transmission bandwidth.
In the past stereoscopic imaging systems have been constructed using two 2D image channels but with the substantial overheads as described above.
The majority of existing 2D image recording and transmission processes have been designed to effectively and efficiently transport 2D images. When stereoscopic or 3D images are transported using 2D mechanisms there are usually a number of compromises that need to be made. These compromises result in a stereoscopic image that is inferior to that obtained by using two 2D mechanisms.
The compromises usually result in the stereoscopic images having one or more of the following characteristics
(i) The resolution of each of the image pair forming the stereoscopic image is of lower resolution than the equivalent 2D image capacity of the transport mechanism. For example, if the mechanism is a PAL television channel with a 2D resolution of 625 lines then each of the images forming the stereoscopic image may have a resolution of 625/2 lines.
(ii) There is a temporal delay between the left and right eye images. This causes motion artifacts when objects within the resulting stereoscopic images move.
(iii) One of the images forming the stereoscopic pair may have a lower resolution than the other, This can prove uncomfortable for some viewers.
(iv) Artifacts exist in both or either images of the resulting stereoscopic pair that were not present in the original images.
There are a number of techniques that have been attempted by the prior art to address these problems, and used to record and transmit stereoscopic images. Where video storage and transmission is required a popular technique is to transmit the left image in one field of the video frame and the right image in the other. This technique is known as Field Sequential 3D and, whilst it has the advantage of simplicity and compatibility with existing 2D video standards, the resulting stereoscopic images suffer from the compromises described in (i) and (ii) above.
An alternative technique is to split the video frame horizontally and transmit one image in the upper frame and the other in the lower. Again this technique has the advantage of simplicity and compatibility with existing 2D video standards and, although overcoming the problem of temporal delay, still provides images of at best half the resolution of the 2D image.
In order to overcome some of these compromises the prior art has endeavored to take advantage of some of the similarities between the left and right eye images. In this technique one eye is recorded or transmitted at normal resolution and the difference between the left eye and right eye is encoded.
This difference information is transferred to the viewing location along with one of the original images and is provided to the decoder where the stereo image is reconstituted. Typically, to preserve storage medium and bandwidth, the difference between the left and right eyes is compressed using both spatial and temporal techniques.
Unfortunately, simply subtracting the left and right images results in information that contains substantial high frequency components and does not therefore enable high levels of compression to be obtained. Further processing is necessary to reduce the size of the difference information and to obtain very high compression levels, such that the additional information represents in the order of a few percent of the associated, compressed, 2D image. This requires complex and costly hardware, which is often unable to encode the stereoscopic images in real time.
This technique has been applied to MPEG encoded video sequences and is described as part of the Multi-Level Profile standard.
An alternative technique described in the prior art is to make use of the disparity between objects in the stereo pair. A disparity map is produced that indicates the relative distance between identical objects in simultaneously acquired left and right eye images. The disparity map is typically compressed using spatial and temporal compression techniques.
Again this is a non-trivial task since generally an autocorrolation technique is used to find matching pixels in the stereo pair. This is particularly difficult for areas in the image that have no texture, e.g. a white wall, and where objects are obscured in one eye and not in the other.
It is an objective of the present invention to provide a relatively simple technique for the compression, encoding, storage and transmission of stereoscopic images that is well suited for real time, or semi real time processing.
The technique will ideally enable 2D compatible stereoscopic images to be produced that represent an increase in information size of only a few percent above an optimally compressed 2D image.
With the above object in mind the present invention provides a method of compressing a stereoscopic image, using entropy coding, wherein a model derived from a first eye image is used to encode an image for a second eye.
It is anticipated that said model will be determined from a first eye difference image of a first frame and a second frame of said first eye image; and that a second eye difference image of a first frame and a second frame of said second eye image, will be encoded.
It will be appreciated that said first frame first eye image should correspond to said first frame second eye image and, similarly, said second frame second eye image should correspond to said second frame second eye image.
In a further aspect, the present invention provides a method of compressing a stereoscopic image including the steps of:
comparing a first frame of a first eye image with a second frame of said first eye image to determine a first difference image;
determining the relative composition of said first difference image to create a first eye model;
comparing a first frame of a second eye image with a second frame of said second eye image to determine a second difference image;
using said first eye model to encode said second difference image.
The relative composition may include a probability distribution of data in said first difference image, Alternatively, the relative composition may include a ranking table of data elements in said first difference image. The data may include red, green and blue values for each pixel.
Conveniently, the first and second frames are consecutive frames of the film or image. The difference value may be determined by comparing the red, green and blue values for each colour pixel. Alternatively, luminance values may also be considered.
The invention is particularly applicable to stereoscopic images in which said first eye image may be compressed as per normal 2D images and said second eye image encoded using the process above. In this scenario it is preferable that the model of the data actually be based on said first eye image and not on said second eye image, to thereby alleviate the need to transmit the model.
The system may preferably be constructed such that each frame of a stereoscopic image is processed as disclosed above. In an alternative embodiment, an adaptive modeling process may be used to alleviate the need to transmit the model, and the model is initialised based on the said first eye frame.
In a further aspect the present invention provides a system incorporating the method above and further including the steps of:
transmitting at least the encoded image; and
decoding the difference data to recreate the image.
To further improve compression said first eye difference image and said second eye difference image may be quantized. Where quantisation is used it will be preferable when recreating the encoded image to also includes the step of rebuilding a damaged pixel. This rebuilding may utilise data on the pixels in said first eye image.
In yet a further aspect the present invention provides a hardware encoder to encode stereoscopic images including:
an analysis means to determine the value of pixels in frames of a first eye and a second eye image;
a delay means to enable a first frame of said first eye image and said second eye image to be compared with respective second frame of said first eye image and said second eye image;
a difference means to determine a first eye difference image and a second eye difference image between respective said first and second frames; and
an encoder to encode said second eye difference image using a model derived from said first eye difference image.
In the preferred embodiment the analysis means may include a principle component analysis transform. Further, the delay means may include a frame delay, and the difference means may include a subtractor.
The present invention provides an alternative method of stereoscopic image encoding. It is observed that the sequence of left eye images shares great statistical similarity with the sequence of right eye images. Further, it is observed that this similarity is invariant to the spatial disparities between the left and right eye sequences. This similarity represents a redundancy, which the invention exploits to achieve high compression.
To exploit the stereo redundancy, one of the original stereo pairs is recorded or broadcast at normal resolution, and may be compressed using any suitable image compression technique including, though not limited to the MPEG family, Soronson etc. The invention compresses the other stereo channel at a very high rate of compression, by alleviating the need to express that information redundant between the stereo pairs. At the time of viewing, or prior to, the conventionally transmitted half of the stereo pair is decoded. The invention, having this information available, may now decode the highly compressed half of the stereo pair.
The invention may also achieve even greater compression performance, by only partially describing the image stream, for example, the data may be quantised. This type of lossy compression is expected to cause a degradation in the image quality. However, the invention presents a technique whereby the conventionally transmitted half of the stereo pair is used as a template to repair any degradation inflicted in the other, highly compressed, image sequence. In this scenario the system allows the quantisation level to be higher than otherwise would be visually acceptable, resulting in a far higher compression level.
This process as disclosed provides a 2D compatible stereoscopic image system.