Recently, with the arrival of the age of multimedia which integrally handles audio, video and pixel values, existing information media, for example, newspaper, journal, Television, radio and telephone, and other means through which information is conveyed to people, has come under the scope of multimedia. In general, multimedia refers to a representation in which not only characters but also graphic symbols, audio and especially pictures and the like are related to each other. However, in order to include the aforementioned existing information media in the scope of multimedia, it appears as a prerequisite to represent such information in digital form.
However, when estimating the amount of information contained in each of the aforementioned information media in digital form, the information amount per character requires 1 to 2 bytes whereas audio requires more than 64 Kbits per second (telephone quality), and a moving picture requires more than 100 Mbits per second (present television reception quality). Therefore, it is not realistic to handle the vast amount of information directly in digital form via the information media mentioned above. For example, a videophone has already been put into practical use via Integrated Services Digital Network (ISDN) with a transmission rate of 64 Kbits/sec to 1.5 Mbits/sec, however, it is impossible to transmit a picture captured by a TV camera.
This therefore requires information compression techniques, and for instance, in the case of a videophone, video compression techniques compliant with H.261 and H.263 Standards recommended by International Telecommunication Union-Telecommunication Standardization Sector (ITU-T) are employed. According to the information compression techniques compliant with the MPEG-1 standard, picture information as well as audio information can be stored in an ordinary music CD (Compact Disc).
Here, Moving Picture Experts Group (MPEG) is an international standard for a compression of moving picture signals and the MPEG-1 is a standard that compresses video signals down to 1.5 Mbit/s, namely, to compress the information included in TV signals approximately down to a hundredth. The quality targeted by the MPEG-1 standard was medium quality so as to realize a transmission rate primarily of about 1.5 Mbits/sec, therefore, MPEG-2, standardized with the view to meeting the requirements of even higher quality picture, realizes a TV broadcast quality for transmitting moving picture signals at a transmission rate of 2 to 15 M bits/sec.
In the present circumstances, a working group (ISO/IEC JTC1/SC29/WG11) previously in charge of the standardization of the MPEG-1 and the MPEG-2 has further standardized MPEG-4 which achieves a compression rate superior to the one achieved by the MPEG-1 and the MPEG-2, allows coding/decoding operations on a per-object basis and realizes a new function required by the age of multi media. At first, in the process of the standardization of the MPEG-4, the aim was to standardize a low bit rate coding, however, the aim is presently extended to a more versatile coding including a high bit rate coding for interlaced pictures and others. Moreover, the ISO/IEC and the ITU-T have jointly developed, as a next-generation image coding method, a standardization of MPEG-4 Advanced Video Coding (AVC) with a higher compression rate, and currently Society of Motion Picture and Television Engineers (SMPTE) attempts to standardize a VC-1 (Proposed SMPTE Standard for Television: VC-1 Compressed Video Bitstream Format and Decoding Process, Final Committee Draft 1 Revision 6, 2005.7.13). A target of the VC-1 is to extend a coding tool and the like, based on the methods of the MPEG-2 and MPEG-4 standards. The VC-1 is expected to be used for next-generation optical disk peripheral devices, such as a Blu-ray disc (BD) and a High Definition (HD) DVD.
In general, in coding of a moving picture, compression of information volume is performed by eliminating redundancy both in spatial and temporal directions. Therefore, an inter-picture prediction coding, which aims at reducing the temporal redundancy, estimates a motion and generates a predicted picture on a block-by-block basis with reference to prior and subsequent pictures, and then codes a differential value between the obtained predicted picture and a current picture to be coded. Here, “picture” is a term to represent a single screen and it represents a frame when used for a progressive picture whereas it represents a frame or fields when used for an interlaced picture. The interlaced picture here is a picture in which a single frame consists of two fields respectively having different time. For coding and decoding an interlaced picture, three ways are possible: processing a single frame either as a frame, as two fields or as a frame/field structure depending on a block in the frame.
A picture to which an intra-picture prediction coding is performed without reference pictures is referred to as an “I-picture”. A picture to which the inter-picture prediction coding is performed with reference to a single picture is referred to as a “P-picture”. A picture to which the inter-picture prediction coding is performed by referring simultaneously to two pictures is referred to as a “B-picture”. The B-picture can refer to two pictures, arbitrarily selected from the pictures whose display time is either forward or backward to that of a current picture to be coded, as an arbitrary combination. However, the reference pictures need to be already coded or decoded as a condition to code or decode these I-picture, P-picture, and B-picture.
FIGS. 1A and 1B are diagrams showing a structure of the conventional MPEG-2 stream. As shown in FIG. 1B, the stream according to the MPEG-2 standard has a layered system. The stream is made up of a plurality of Group of Pictures (GOP). It is possible to edit a moving picture and to perform random access on it by using the GOP as a basic unit used in coding processing. This means that a starting picture in the GOP is a random access point. The GOP consists of a plurality of pictures, each being I-picture, P-picture and B-picture. The stream, GOP and picture respectively include a synchronous signal (sync) indicating a boundary between respective units and a header that is data commonly included in the respective units.
FIGS. 2A and 2B are examples of a prediction structure of pictures according to the MPEG-2 standard. Shaded pictures in FIG. 2A are reference pictures which are referred to predict other pictures. As shown in FIG. 2A, in the MPEG-2 standard, P-picture (picture P0, P6, P9, P12, or P15) can be predicted from one picture, either I-picture or P-picture, whose display time immediately precedes that of the P-picture. B-picture (picture B1, B2, B4, B5, B7, B8, B10, B11, B13, B14, B16, B17, B19, or B20) can be predicted from one picture whose display time immediately precedes the B-picture or one picture whose display time immediately follows the B-picture, both of which can be either I-picture or P-picture. The positions of the B-pictures are arranged in the stream, either immediately after I-picture or P-picture. Therefore, at the time of performing random access, all the pictures after I-picture can be decoded and displayed, when decoding starts from I-picture. Regarding a structure of the GOP, the pictures from I3 to B14 can be considered as one GOP, as shown in FIG. 2B for example.
FIG. 3 is a diagram showing a structure of a stream according to the VC-1. The stream according to the VC-1 also has the same structure as described for the MPEG-2 standard. However, a random access point is referred to as an “entry point” which is added with an entry point header (Entry Point HDR). Data from the entry point to a next entry point is a random access unit (RAU), which is equivalent to one GOP according to the MPEG-2 standard. Hereafter, the RAU according to the VC-1 is referred to as a “random access point (RAU)”. Note that the RAU can store user data regarding pictures in the RAU (user data at Entry-point level), and the RAU is arranged immediately after the entry point header.
Here, types of pictures according to the VC-1 are described. In the VC-1, the I-picture, P-picture, and B-picture are also defined. These I-picture, P-picture, and B-picture have the same prediction structure as described for the MPEG-2 standard. In the VC-1, in addition to the above three types of picture, there are two more defined types, which are Skipped-picture and BI-picture. The Skipped-picture is a picture which does not include any pixel data, and treated as a P-picture having the same pixel data of a prior reference picture in decoding order. For example, in examples of (1) and (2), a picture S5 is regarded the same picture as a picture P3, so that the same operation of decoding the stream is performed in both (1) and (2).
(1) Display order: Picture I0, Picture B2, Picture P1, Picture B4, Picture P3, Picture B6, Picture S5 (Note that the picture represented by a symbol including I is an I-picture, the picture represented by a symbol including P is a P-picture, the picture represented by a symbol including B is a B-picture, and the picture represented by a symbol including S is a Skipped-picture. For example, the picture S6 is a Skipped-picture. The numerals attached to the symbols of the pictures represent decoding order.)
(2) Display order: Picture I0, Picture B2, Picture P1, Picture B4, Picture P3, Picture B6, Picture P5 (P5 has the same pixel data as P3.)
The Skipped-picture is especially useful when pictures are still. For example, in a case where the pictures are still in the middle of the RAU, Skipped-pictures are used where the pictures are still, for example, where there are picture I0, picture P1, picture P2, picture P3, picture S4, picture S5, picture S6 . . . , in order to reduce an amount of data to be coded.
Furthermore, BI-picture is a picture having characteristics of the B-picture and I-picture. More specifically, the BI-picture has the B-picture characteristics in which decoding order is different from display order (re-ordering of the BI-picture and an I- or P-picture is necessary. For example, BI-picture precedes in display order a starting intra-coded picture in an RAU and follows in decoding order the starting intra-coded picture in the RAU), and the picture is not a reference picture for other pictures. In addition, the BI-picture has the I-picture characteristics in which all macroblocks are intra-coded and the picture is not predicted from any other pictures.
Next, a method for distinguishing the I-picture, P-picture, B-picture, Skipped-picture, and BI-pictures is described. Basically, the types of pictures can be distinguished based on the picture types included in a picture layer in a stream. However, the picture types indicated by the picture layer are defined as following, depending on profiles.
For example, in a simple profile, picture types are indicated as I-picture and P-picture. In a main profile, picture types are indicated as I-picture, P-picture, and B- or BI-picture. In an advanced profile, picture types are indicated as I-picture, P-picture, B-picture, BI-picture, and Skipped-picture.
Here, in both of the simple profile and the main profile, it is impossible to distinguish the Skipped-picture by using the picture types in the picture layer, so that, in a case where an arbitrary picture has a size of one or less byte, the picture is defined as the Skipped-picture. Furthermore, in the main profile, one picture type indicates that a picture is a B-picture or a BI-picture, so that it is impossible to distinguish B-picture from BI-picture, based on the picture type.
FIG. 4 is a block diagram showing a picture coding apparatus for realizing the conventional image coding method.
A picture coding apparatus 800 performs compressed coding, variable length coding, and the like, for an inputted picture signal Vin, thereby transforming the picture signal Vin into a bitstream (stream) Str to be outputted. The picture coding apparatus 800 includes a motion estimation unit 801, a motion compensation unit 802, a subtractor 803, an orthogonal transformation unit 804, a quantization unit 805, an inverse quantization unit 806, an inverse orthogonal transformation unit 807, an adder 808, a picture memory 809, a switch 810, a variable length coding unit 811, and a prediction structure determination unit 812.
The picture signal Vin is inputted into the subtractor 803 and the motion estimation unit 801. The subtractor 803 calculates a differential between the inputted picture signal Vin and a predicted picture, and outputs the differential to the orthogonal transformation unit 804. The orthogonal transformation unit 804 transforms the differential into a frequency coefficient, and outputs the frequency coefficient into the quantization unit 805. The quantization unit 805 quantizes the inputted frequency coefficient, and outputs the resulting quantization value Qc into the variable length coding unit 811.
The inverse quantization unit 806 inversely quantizes the quantization value Qc in order to restore the original frequency coefficient, and outputs the resulting frequency coefficient to the inverse orthogonal transformation unit 807. The inverse orthogonal transformation unit 807 performs inverse-frequency transformation on the frequency coefficient to be transformed into a pixel differential, and outputs the pixel differential to the adder 808. The adder 808 adds the pixel differential with a predicted picture which is outputted from the motion compensation unit 802, and generates a decoded picture. The switch 810 is On when the decoded picture is instructed to be stored, and the decoded picture is stored into the picture memory 809.
On the other hand, the motion estimation unit 801, in which the picture signal Vin is inputted in units of macroblocks, searches the decoded pictures (reference pictures) which are stored in the picture memory 809, detects an image having the most similar image to a macroblock indicated by the picture signal Vin, and determines a motion vector MV for indicating a location of the image.
The motion compensation unit 802, by using the determined motion vector and the like, retrieves the most suitable image for a predicted picture, from the decoded picture stored in the picture memory 809.
A prediction structure determination unit 812 determines, based on a RAU start picture Uin, that a picture to be coded is at a RAU start position, then instructs, using a picture type Pt, the motion estimation unit 801 and the motion compensation unit 802 to code (inter-picture coding) the picture as a special randomly-accessible picture, and further instructs the variable length coding unit 811 to code the picture type Pt.
The variable length coding unit 811 performs variable length coding on the quantization value Qc, the picture type Pt, and the motion vector MV in order to generate a stream Str.
FIG. 5 is a block diagram showing a picture decoding apparatus 900 for realizing the conventional image decoding method. The reference numerals in FIG. 4 are assigned to identical units in FIG. 5, and the those units operate in the same manner as described for the picture coding apparatus for realizing the conventional image coding method in FIG. 4, so that the details of those units are not described herein below.
The variable length decoding unit 901 decodes the stream Str, and outputs the quantization value Qc, a reference picture specification information Ind, the picture type Pt, the motion vector MV, and the like. The picture memory 809 obtains the movement vector MV, the motion compensation unit 802 obtains the picture type Pt, the movement vector MV, and the reference picture specification information Ind, and the inverse quantization unit 806 obtains the quantization value Qc. The decoding is performed by the picture memory 809, the motion compensation unit 802, and the inverse quantization unit 806, the inverse orthogonal transformation unit 807, and the adder 808. The operation of the decoding has been described with reference to the block diagram of FIG. 4 showing the picture coding apparatus 800 for realizing the conventional coding method.
A buffer memory 902 is a memory for storing a decoded picture Vout which is outputted from the adder 808, and a display unit 903 obtains the decoded picture Vout from the buffer memory 902 and displays a picture according to the decoded picture Vout. Note that the buffer memory 809 and the picture memory 902 can share the same memory.
FIG. 6 is a flowchart showing decoding during special play-back, such as high-speed play-back, performed by the conventional picture decoding apparatus 900. Firstly, the conventional picture decoding apparatus 900 detects, from the stream Str, a header of a picture to be decoded at Step S1001. Then at Step 1002, the conventional picture decoding apparatus 900 examines, based on a picture type in the header included in the picture layer, whether or not the starting picture needs to be decoded. At Step S1003, the conventional picture decoding apparatus 900 determines whether or not the picture is examined to be decoded at Step 1002, and if the decoding needs to be decoded, then the processing proceeds to Step S1004, while if the picture does not need to be decoded, then the processing proceeds to Step S1005. Finally, at Step S1005, the conventional picture decoding apparatus 900 determines whether or not the processing completes even for a last picture to be play-backed, such as a last picture in a RAU or a stream, and if there are still pictures to be processed, the processing repeats the steps from Step S1001 to S1005, and if the last picture is processed, the processing completes.
However, in the above conventional picture coding apparatus 800 and picture decoding apparatus 900, there is a problem of a large amount of processing load, during coding the stream Str which includes Skipped-pictures, and especially during the special play-back such as high-speed play-back. Furthermore, in decoding of the stream Str including BI-pictures, especially in special play-back such as play-back performed from the middle of entire data (hereinafter, referring to as jumping play-back), there is the same problem as described above that a large amount of processing load is required.
FIG. 7 is a diagram showing the problem of the above-described conventional picture coding apparatus 800 and picture decoding apparatus 900.
In (a) of FIG. 7, a structure of the conventional RAU including the Skipped-pictures is shown. The RAU includes twenty-four pictures in which the images are still in the fourth and following pictures in decoding order, so that the fifth and later pictures are all Skipped-pictures. When such a RAU is play-backed at triple speed, the conventional picture decoding apparatus 900 attempts to decode the 1st, 4th, 7th, 10th, 13th, 16th, 19th and 22nd pictures, sequentially to be play-backed. However, pictures to be practically decoded are only first I-picture and the fourth P-picture as shown in (c) of FIG. 7.
This means that, in a RAU in the conventional stream Str, the picture decoding apparatus 900 cannot determine whether or not the pictures are to be decoded, unless a head of each picture (picture layer) is searched to obtain a picture type, since each picture layer includes a picture type of the picture. Therefore, as shown in (b) of FIG. 7, the picture decoding apparatus 900 needs to analyze the 7th, 10th, 13th, 16th, 19th and 22nd Skipped-pictures to obtain the picture types.
As described above, for the high-speed play-back of the conventional RAU, the conventional picture coding apparatus and picture decoding apparatus need to analyze even pictures which do not need to be decoded, which eventually results in a large amount of data for decoding.
Furthermore, when jumping play-back is performed from a RAU including BI-pictures, the above conventional picture coding apparatus 800 and picture decoding apparatus 900 require a large processing amount in decoding.
This means that, in the conventional picture coding apparatus 800, when an open GOP type RAU is generated, it is a possibility that a picture (hereinafter, referred to as a re-ordered picture) is encoded as a B-picture or a BI-picture. Note that the re-ordered picture is positioned in a display order before a starting I-picture that is positioned in a decoding order as the first picture in the RAU, but the re-ordered picture is positioned in the decoding order after the starting I-picture. Here, when the jumping play-back is performed from the open GOP type RAU, if the above re-ordered picture is a B-picture, there is a case that it is impossible to decode nor display the B-picture. However, if the re-ordered picture is a BI-picture, it is possible to decode and display the BI-picture.
Therefore, the conventional picture decoding apparatus 900 analyzes each re-ordered picture included in the RAU in the stream Str, thereby determining whether the re-ordered picture is a B-picture or a BI-picture. If the re-ordered picture is a B-picture, then the picture decoding apparatus 900 does not decode the re-ordered picture. On the other hand, if the re-ordered picture is a BI-picture, then the picture decoding apparatus 900 decodes the re-ordered picture.
However, the above determination of whether the re-ordered picture is a B-picture or a BI-picture requires a large processing amount, which sometimes result in delay of the processing.
Therefore, when jumping play-back is performed from the open GOP type RAU, the conventional picture decoding apparatus 900 does not decode nor display the re-ordered picture, regardless of whether the re-ordered picture is a B-picture or a BI-picture, without any specific necessity. As a result, when the re-ordered picture is a BI-picture, it has been impossible to effectively play-back such a BI-picture.
Thus, the present invention addresses the above problems and an object of the present invention is to provide a picture coding apparatus and a picture decoding apparatus which can reduce load in decoding.