Image data, particularly, moving image data is generally great in data volume, and accordingly, at the time of being transmitted from a transmission device to a reception device, or at the time of being stored in a storage device, or the like, high-efficiency encoding is performed. “High-efficiency encoding” mentioned here means encoding processing to transform a certain data string into another data string, and processing to compress data volume thereof.
The intra-screen prediction encoding method and inter-screen prediction encoding method have been known as a moving image data high-efficiency encoding method.
With the intra-screen prediction encoding method, the fact that that moving image data is high in correlation in the spatial direction is taken advantage of. That is to say, the intra-screen prediction encoding method is a method to encode/decode a frame image to be encoded using only the information of the frame image to be encoded without using another frame image. The intra-screen prediction encoding method may also be referred to as “intra-frame prediction encoding method”.
Also, with the inter-screen prediction encoding method, the fact that moving image data is high in correlation in the temporal direction is taken advantage of. With moving image data, in general, a frame image temporally approximated is high in similarity, and accordingly, redundancy may be removed by performing encoding of a frame image to be encoded with reference to a decoded image decoded from an already encoded frame image. The inter-screen prediction encoding method may also be referred to as “inter-frame prediction encoding method”.
With the inter-screen prediction encoding method, it is common to divide an image to be encoded into blocks, and to perform encoding in increments of the divided blocks. First, a decoded image is generated by decoding another already encoded frame image for reference, and the image in an area similar to the image of a block to be encoded is selected from the decoded image. Further, difference between the selected image of the area and the image of the block to be encoded is obtained as prediction error, and redundancy is removed. Motion vector information indicating a spatial gap as to the similar area, and the prediction error from which the redundancy has been removed are encoded, thereby realizing a high compression ratio. Note that the prediction error is also referred to as a prediction error image.
On the other hand, the reception device which has received encoded data decodes the received motion vector information and difference information to play an image thereof.
Examples of a typical moving image coding method include ISO/IEC MPEG-2/MPEG-4 (hereafter, MPEG-2, MPEG-4).
With the current moving image coding standard method represented by MPEG-2 Video or H.264/MPEG-4 AVC, an arrangement to handle encoding and decoding of a stereo 3-dimensional video is prepared. Two images having a different viewpoint are included in the 3-dimensional video to be handled here. Hereafter, these two different viewpoint images will be referred to as an image for the left eye and an image for the right eye.
In the event of encoding a stereoscopic image, an image for the left eye and an image for the right eye making up the video are thinned so as to divide the number of pixels of each image as illustrated in FIG. 23, and are arrayed and packed into one image, and then are encoded using a conventional coding method. Thinning and packing includes various methods. For example, the interleave method, side-by-side method, top-and-bottom method, and so forth have been known. The interleave method is a method to dispose an image for the right eye and an image for the left eye for each scanning line. The side-by-side method is a method to dispose an image for the left eye and an image for the right eye in a manner horizontally adjacent to each other. The top-and-bottom method is a method to dispose an image for the left eye and an image for the right eye in a manner vertically adjacent to each other. FIG. 23 illustrates a conceptual diagram of the side-by-side method.
When transmitting the encoded data of a stereoscopic image from the encoding device to the decoding device, information indicating that data to be transmitted is a bit stream encoded from the stereoscopic image is informed by being included in header information area of the transmission data. Therefore, for example, with MPEG-2, as disclosed in ISO/IEC 13818, Generic coding of moving pictures and associated audio information for example, the area of user_data is employed, and also, with H.264/MPE-4 AVC, as disclosed in ISO/IEC 14496-10, MPEG-4 part 10 advanced video coding, the area of Frame packing arrangement SEI is employed. FIG. 24 illustrates the syntax of header information to be used for Frame packing arrangement SEI of H.264/MPEG-4 AVC.
Note that in the event of encoding the above-mentioned stereoscopic image employing the side-by-side method or top-and-bottom method using inter-screen prediction, at the time of searching a block similar to the block to be encoded from a decoded image, images having a different viewpoint may be referenced (see FIG. 25). In this way, after taking images having a different viewpoint into consideration, a portion having high similarity is referenced, whereby encoding efficiency may be improved.
As technology for improving encoding efficiency, for example, technology other than the above has been disclosed in Japanese Laid-open Patent Publication No. 2005-159824, wherein global motion information (motion vector) between frame images is searched, and image information outside of a frame image to be referenced is created from the other image based on this global motion information. Note that the global motion information is motion information of the entire one image, and may be obtained from the statistics value of the motion vector of each of multiple areas making up one image.