Normally, a medium for a visual information transmission has been developed from a two-dimensional terminal such as a television set. In other words, a black-and-white image has developed into a color image, i.e., a standard definition television set has developed into a high definition television set (e.g., HDTV). Therefore an amount of visual information data tends to be increased.
Since current visual information is not two-dimensional, but three-dimensional, a development of a technology related to three-dimensional visual information is needed in order to transmit reality and natural multimedia information.
To describe in detail, FIG. 1 shows an encoder and a decoder of a multi-view profile, which are implemented by applying a time scalability of a moving picture experts group 2 (MPEG-2) standard.
The scalability provided by the MPEG-2 is for simultaneously decoding images having different resolutions and types by using one image-processing device. The time scalability among scalabilities supported by the MPEG-2 is a technique for improving a visual quality by increasing a frame rate. The multi-view profile is applied to a stereo sequences by considering the time scalability.
In fact, a structure of a stereo encoder and decoder applying the stereo sequences is similar to a structure applying the time scalability as shown in FIG. 1. Left view sequences among the stereo sequences are input into a base view encoder 110, and right view thereof is input into a temporal auxiliary view encoder 100.
The temporal auxiliary view encoder 100 for the time scalability is an interlayer encoder for interleaving images between images of a base layer.
Therefore, if the left view is separately encoded and decoded, a two-dimensional moving picture can be obtained from this system. If the left view and the right view are simultaneously encoded and decoded, a stereo sequences can be implemented. Here, in order to transmit or store the moving picture, a system multiplexer 120 and a system demultiplexer 130 are needed for combining or separating sequences of two images.
FIG. 2 illustrates a stereo moving picture encoder/decoder using a MPEG-2 multi-view profile (MVP).
The image of the base layer is encoded by using a motion compensation and a discrete cosine transform (DCT). The encoded image is decoded through an inverse process. The temporal auxiliary view encoder 100 takes the role of a temporal interlayer encoder predicted based on the decoded image of the base layer.
Generally, two disparity compensated predictions, or one disparity and one motion compensated prediction may be used in this case. The temporal auxiliary view encoder includes disparity and motion compensated DCT encoder and decoder as the encoder and decoder of the base layer does.
Further, a disparity compensated encoding process needs a disparity predictor and a compensator as a motion prediction/compensation encoding process needs a motion predictor and a compensator. In addition to a block-based motion/disparity prediction and compensation, the encoding process includes the DCT of a differential value between a predicted image and an original image, a residual image, a quantization of a DCT coefficient and a variable length encoding. On the contrary, a decoding process is constituted with a variable length decoding, an inverse quantizaiton and an inverse DCT.
The MPEG-2 encoding is a very effective compression method because of a bidirectional motion prediction for a bidirectionally motion-compensated pictures (B-pictures). Also, since the MPEG-2 encoding has very effective time scalability, a high efficient compression can be obtained by employing the B-pictures using bidirectional motion prediction to encode a right view.
FIG. 3 describes a predictive encoding considering only time difference by using two disparity predictions for the bi-directional motion prediction. A left image in the left view is encoded by using a non-scalable MPEG-2 encoder, and a right image in the right view is encoded by using a MPEG-2 temporal auxiliary view encoder based on the decoded left image.
In other words, the right image is encoded into the B-picture by using the prediction obtained from two reference images, e.g., left images. In this case, one of the two reference images is a left image to be simultaneously displayed, and the other is a left image to be temporally followed by.
Two predictions have three prediction modes as the motion estimation/compensation does: a forward, a backward and an interpolated mode. In the forward mode, a disparity predicted from an isochronal left image is obtained, and, in the backward mode, a disparity predicted from a very next left image is obtained. In this case, a prediction of a right image is performed through disparity vectors of the two left images, and such kind of prediction method is called a predictive encoding considering only disparity vectors. Therefore, the encoder estimates two disparity vectors on each frame of a right image, and the decoder decodes the right image from the left image by using the two disparity vectors.
In FIG. 4, a B-picture is obtained based on the bidirectional prediction scheme illustrated in FIG. 3, but using one disparity estimation and one motion estimation. That is, one uses the disparity prediction from an isochronal left image and the motion prediction from a previous right image in the right view.
Further, the bidirectional prediction also includes three prediction modes, called a forward, a backward and an interpolated mode, as in a prediction encoding considering only a disparity does. Here, the forward mode means a motion prediction from the decoded right image, and the backward mode means a disparity prediction from the decoded left image.
As described above, the MPEG-2 multi-view profile (MVP) itself is designed suitable for the stereo moving picture without considering an encoder structure for the multi-view moving picture, therefore, an encoder for providing a multi-view moving picture is needed in order to simultaneously provide a three-dimensional effect and reality to many people.
Further, the MPEG-2 suggests a standard on encoding and decoding of a moving picture. That is, as illustrated in FIG. 5, a picture type specified by the MPEG-2 is categorized into three: an intra coded picture (I picture), a predictive coded picture (P picture) and a bidirectionally predictive coded picture (B picture). The I picture is encoded by performing DCT without using the motion estimation/compensation process. The P picture is encoded by performing DCT on difference data after performing the motion estimation/compensation by referring to the I picture or another P picture. The B picture uses the motion compensation as the P picture does, but performs motion estimation/compensation from two frames on a time axis.
The picture sequence of the MPEG-2 has a structure, e.g., B, B, I, B, B, P, . . . , and a set of picture from an I picture to a next I picture is called a group of pictures (GOP). The number of pictures in the GOP is designated as N, and the number of pictures between two neighboring I and P pictures or between two neighboring P pictures is designated as M.
Since the MPEG-2 is a standard for encoding and decoding a single-view moving picture, it does not define an encoder for a multi-view moving picture. Further, though the MPEG-2 provides the MVP for extending a single view moving picture into a stereo type moving picture, it still does not support an encoder for extending a single view or stereo moving picture into a multi-view moving picture.