(1) Field of the Invention
The present invention relates to a multiview video decoding apparatus, a multiview video decoding method, a multiview video decoding program, and a multiview video decoding integrated circuit for decoding a plurality of multiview coded video streams having a reference relationship.
(2) Description of the Related Art
There have been proposed three-dimensional image coding methods for enabling three-dimensional visual recognition by humans.
Examples of such methods include a method of preparing two kinds of videos including at least one video for the left eye and one video for the right eye captured in mutually different directions, coding the videos such that the two kinds of videos have a mutual reference relationship, and multiplexing the coded video streams to generate a multiview video.
This method has been recently standardized as the Multiview Video Coding (hereinafter referred to as MVC) in Non-Patent Reference 1 (International Telecommunication Union Telecommunication Standardization Sector (ITU-T) Recommendations “H.264”, issued in March, 2009).
Each of FIGS. 14A and 14B is a diagram illustrating MVC according to the conventional art.
MVC defines coded video streams in which two kinds of videos from a plurality of viewpoints have a mutual reference relationship.
The MVC defines two kinds of multiview coded video streams: one of which is at least one coded video stream (hereinafter referred to as a coded video stream at the base view side) that does not refer to another coded stream and thus can be independently decoded by the coded video stream itself, and a plurality of streams (hereinafter referred to as coded video streams at the dependent view side) each of which has a mutual reference relationship with the base view.
The following exemplary descriptions are given assuming that the coded video stream at the base view side coded based on the left-eye viewpoint is a coded video stream for L, and that the coded video streams at the dependent view sides coded based on the right-eye viewpoints are coded video streams for R.
FIG. 14A is an illustration related to MVC, and shows an exemplary picture structure of the image for L (a decoded image resulting from the coded video stream at the base view side) and an exemplary picture structure of an image for R (a decoded image resulting from the coded video stream at the dependent view side).
In the diagram, images for L and images for R are shown in decode orders. In the diagram, I denotes a picture (hereinafter referred to as an I-picture) composed of intra coded images. In addition, P denotes a picture (hereinafter referred to as a P-picture) including inter coded images. In addition, B denotes a picture (hereinafter referred to as a B-picture) including bi-directional coded images.
As for the I-picture among the pictures, it is possible to reconstruct the original image data of the I-picture based only on the decoded data of the I-picture itself. However, the remaining P-picture and B-picture require a reference image and reference images, respectively, in addition to the result of decoding the pictures themselves so that the original image data of the pictures themselves can be reconstructed. Here, smaller numbers show images output earlier. In addition, (L) and (R) denote images for L and R, respectively.
FIG. 14B is an illustration related to MVC, and is a diagram showing the pictures in FIG. 14A in display order.
Each of the arrows in the diagram denotes a reference relationship, and more specifically, shows that, in inter coding, the image as a source of the arrow is referred to in order to reconstruct the image indicated by the arrow. For example, the diagram shows that, in order to decode B-picture B0 (L), I-picture I2 (L) as a reference image is required in addition to the result of decoding the B-picture B0 (L) itself. This is true of the other images for L that are Pictures B1 (L), B4 (L), and P5 (L).
The diagram shows that, among the images for R, each of images (pictures) B0 (R), B1 (R), B3 (R), B4 (R), and P5 (R) refers to a corresponding one of the images for R in the same manner as in the case of the images for L, and that these images (pictures) and image (picture) P2 (R) can also refer to the image for L that is displayed at a substantially the same display time as the display time of each of the pictures for R. In other words, the diagram shows that the images for R have reference relationships with images for L. In this case, the coded video stream for R shows a coded video stream at the dependent view side.
FIG. 15 is a diagram illustrating coded video streams in the conventional art coded according to the MVC Standard. In other words, the diagram shows an example of MVC-based reference relationships between the images for L and the images for R.
In the diagram, the upper sequence represents output images for the left eye, and the lower sequence represents output images for the right eye. In other words, L-1 to L-6 denote output images generated by decoding a coded video stream for the left eye, and R-1 to R-6 denote output images generated by decoding a coded video stream for the right eye. Here, smaller numbers show images output earlier.
In addition, the image name L-5 underlined denotes an output image generated by decoding an intra coded image, and the other image names denote output images generated by decoding inter coded images.
Each of the arrows shows that, in inter coding, the image that is the source of the arrow is referred to in order to decode the image indicated by the arrow. For example, the diagram shows that Images R-2 and L-3 are referred to in order to decode an image R-3.
In the cases where multiview coded video streams generated in this way are input from an optical disc or a hard disc, are transmitted using wireless communication, and are subjected to streaming distribution, there is a possibility that errors such as a bit reverse, a bit loss, and a bit inclusion occur before the multiview video streams reach the decoder sides. This causes a problem of degrading the image quality of the resulting decoded images.
Patent Reference 1 (Japanese Laid-open Patent Application
Publication No. 2003-319419) discloses a multiview video decoding apparatus which solves this problem.
The decoding apparatus disclosed in Patent Reference 1 is an apparatus which generates reproduced images by reading out and decoding coded video streams from a recording medium having thereon coded video streams generated by imaging a subject in imaging directions.
This multiview video decoding apparatus includes: a recording medium reading circuit which reads out coded video streams from a recording medium having recorded thereon image data generated in the respective imaging directions when the subject is imaged in the respective imaging directions; and a decoding circuit which saves the coded video streams read out from the recording medium reading circuit and then decodes the coded video streams. The multiview video decoding apparatus further includes: a decoding error detecting circuit which detects whether or not any one of the coded video streams includes an error; a decoded image buffer in which the decoded images generated by the decoding circuit are stored for the respective imaging directions; and an error image concealing circuit which performs, in the case where one of the coded video streams includes an error, error concealment by replacing, in units of an image, decoded error images with a decoded image of another channel, and outputs the replacement as the output images.
The multiview video decoding apparatus disclosed in Patent Reference 1 has a structure as shown in FIG. 16. In other words, FIG. 16 is a block diagram of a structure of a multiview video decoding apparatus 1000 according to the conventional art. It is to be noted that the following descriptions are given assuming that the multi-channel coded streams are two-channel coded video streams that are an L-channel coded stream for the left eye and an R-channel coded stream for the right eye.
As shown in the diagram, the input multi-channel coded video streams are decoded by a decoding unit 1010 first to be decoded images. In the case where no error is found up to this process, the L-channel decoded image is sent to a decoded image buffer L1041 for the L channel as an output image L to be a final L-channel output image. Likewise, the R-channel decoded image is sent to a decoded image buffer R1042 for the R channel as an output image R to be a final R-channel output image.
These output images are sent to a display device for multiview video (for three-dimensional video) devised such that L-channel output images can be displayed for the left eye of a user and R-channel output images can be displayed for the right eye of the user. This allows the user to recognize the images as a three-dimensional video.
Here, if the error detecting unit 1020 detects an error in one of the channels, for example, in the R channel, and judges that accurate decoding is impossible, only the output image for the right eye may be lost, or an image including the error may be displayed.
In order to prevent this, the error detecting unit 1020 outputs, to an output image determining unit 1030, an instruction for causing the L-channel decoded image having substantially the same display time to be transferred also to the error image concealing unit R1060 for the R channel. The output image determining unit 1030 receives this instruction, and causes the L-channel decoded image to be output as an output image R from the decoded image buffer L1041, via the error image concealing unit R1060.
This makes the output image for the right eye and the output image for the left eye become identical to each other, but makes it possible to prevent a situation in which an image is lost or an image including an error is displayed.
Furthermore, Patent Reference 2 (Japanese Laid-open Patent Application Publication No. 7-322302) discloses error concealment by copying a previous image of a channel in which an error is found.
In FIG. 16, the arrow from the buffer 1040 (the decoded image buffer L1041 and the decoded image buffer R1042) to the decoding unit 1010 shows that decoded images are referred to in inter decoding of a coded video stream.