(1) Field of the Invention
The present invention relates to a picture coding apparatus which codes a moving picture, a stream which is generated by an image coding method using the picture coding apparatus, and a picture decoding apparatus which decodes the stream.
(2) Description of the Related Art
Recently, with the arrival of the age of multimedia which integrally handles audio, video and pixel values, existing information media, for example, newspaper, journal, Television, radio and telephone, and other means through which information is conveyed to people, has come under the scope of multimedia. In general, multimedia refers to a representation in which not only characters but also graphic symbols, audio and especially pictures and the like are related to each other. However, in order to include the aforementioned existing information media in the scope of multimedia, it appears as a prerequisite to represent such information in digital form.
However, when estimating the amount of information contained in each of the aforementioned information media in digital form, the information amount per character requires 1 to 2 bytes whereas audio requires more than 64 Kbits per second (telephone quality), and a moving picture requires more than 100 Mbits per second (present television reception quality). Therefore, it is not realistic to handle the vast amount of information directly in digital form via the information media mentioned above. For example, a videophone has already been put into practical use via Integrated Services Digital Network (ISDN) with a transmission rate of 64 Kbits/sec to 1.5 Mbits/sec, however, it is impossible to transmit a picture captured by a TV camera.
This therefore requires information compression techniques, and for instance, in the case of a videophone, video compression techniques compliant with H.261 and H.263 Standards recommended by International Telecommunication Union-Telecommunication Standardization Sector (ITU-T) are employed. According to the information compression techniques compliant with the MPEG-1 standard, picture information as well as audio information can be stored in an ordinary music CD (Compact Disc).
Here, Moving Picture Experts Group (MPEG) is an international standard for a compression of moving picture signals and the MPEG-1 is a standard that compresses video signals down to 1.5 Mbit/s, namely, to compress the information included in TV signals approximately down to a hundredth. The quality targeted by the MPEG-1 standard was medium quality so as to realize a transmission rate primarily of about 1.5 Mbits/sec, therefore, MPEG-2, standardized with the view to meeting the requirements of even higher quality picture, realizes a TV broadcast quality for transmitting moving picture signals at a transmission rate of 2 to 15 Mbits/sec.
In the present circumstances, a working group (ISO/IEC JTC1/SC29/WG11) previously in charge of the standardization of the MPEG-1 and the MPEG-2 has further standardized MPEG-4 which achieves a compression rate superior to the one achieved by the MPEG-1 and the MPEG-2, allows coding/decoding operations on a per-object basis and realizes a new function required by the age of multi media. At first, in the process of the standardization of the MPEG-4, the aim was to standardize a low bit rate coding, however, the aim is presently extended to a more versatile coding including a high bit rate coding for interlaced pictures and others. Moreover, the ISO/IEC and the ITU-T have jointly developed, as a next-generation image coding method, a standardization of MPEG-4 Advanced Video Coding (AVC) with a higher compression rate, and currently Society of Motion Picture and Television Engineers (SMPTE) attempts to standardize a VC-1 (Proposed SMPTE Standard for Television: VC-1 Compressed Video Bitstream Format and Decoding Process, Final Committee Draft 1 Revision 6, 2005 Jul. 13). A target of the VC-1 is to extend a coding tool and the like, based on the methods of the MPEG-2 and MPEG-4 standards. The VC-1 is expected to be used for next-generation optical disk peripheral devices, such as a Blu-ray disc (BD) and a High Definition (HD) DVD.
In general, in coding of a moving picture, compression of information volume is performed by eliminating redundancy both in spatial and temporal directions. Therefore, an inter-picture prediction coding, which aims at reducing the temporal redundancy, estimates a motion and generates a predicted picture on a block-by-block basis with reference to prior and subsequent pictures, and then codes a differential value between the obtained predicted picture and a current picture to be coded. Here, “picture” is a term to represent a single screen and it represents a frame when used for a progressive picture whereas it represents a frame or fields when used for an interlaced picture. The interlaced picture here is a picture in which a single frame consists of two fields respectively having different time. For coding and decoding an interlaced picture, three ways are possible: processing a single frame either as a frame, as two fields or as a frame/field structure depending on a block in the frame.
A picture to which an intra-picture prediction coding is performed without reference pictures is referred to as an “I-picture”. A picture to which the inter-picture prediction coding is performed with reference to a single picture is referred to as a “P-picture”. A picture to which the inter-picture prediction coding is performed by referring simultaneously to two pictures is referred to as a “B-picture”. The B-picture can refer to two pictures, arbitrarily selected from the pictures whose display time is either forward or backward to that of a current picture to be coded, as an arbitrary combination. However, the reference pictures need to be already coded or decoded as a condition to code or decode these I-picture, P-picture, and B-picture.
FIGS. 1A and 1B are diagrams showing a structure of the conventional MPEG-2 stream. As shown in FIG. 1B, the stream according to the MPEG-2 standard has a layered system. The stream is made up of a plurality of Group of Pictures (GOP). It is possible to edit a moving picture and to perform random access on it by using the GOP as a basic unit used in coding processing. This means that a starting picture in the GOP is a random access point. The GOP consists of a plurality of pictures, each being I-picture, P-picture and B-picture. The stream, GOP and picture respectively include a synchronous signal (sync) indicating a boundary between respective units and a header that is data commonly included in the respective units.
FIGS. 2A and 2B are examples of a prediction structure of pictures according to the MPEG-2 standard. Shaded pictures in FIG. 2A are reference pictures which are referred to predict other pictures. As shown in FIG. 2A, in the MPEG-2 standard, P-picture (picture P0, P6, P9, P12, or P15) can be predicted from one picture, either I-picture or P-picture, whose display time immediately precedes that of the P-picture. B-picture (picture B1, B2, B4, B5, B7, B8, B10, B11, B13, B14, B16, B17, B19, or B20) can be predicted from one picture whose display time immediately precedes the B-picture or one picture whose display time immediately follows the B-picture, both of which can be either I-picture or P-picture. The positions of the B-pictures are arranged in the stream, either immediately subsequent to I-picture or P-picture. Therefore, at the time of performing random access, all the pictures subsequent to I-picture can be decoded and displayed, when decoding starts from I-picture. Regarding a structure of the GOP, the pictures from I3 to B14 can be considered as one GOP, as shown in FIG. 2B for example.
FIG. 3 is a diagram showing a structure of a stream according to the VC-1. The stream according to the VC-1 also has the same structure as described for the MPEG-2 standard. However, a random access point is referred to as an “entry point” which is added with an entry point header (Entry Point HDR). Data from the entry point to a next entry point is a random access unit (RAU), which is equivalent to one GOP according to the MPEG-2 standard. Hereafter, the RAU according to the VC-1 is referred to as a “random access point (RAU)”. Note that the RAU can store user data regarding pictures in the RAU (user data at Entry-point level), and the RAU is arranged immediately subsequent to the entry point header.
Here, types of pictures according to the VC-1 are described. In the VC-1, the I-picture, P-picture, and B-picture are also defined. These I-picture, P-picture, and B-picture have the same prediction structure as described for the MPEG-2 standard. In the VC-1, in addition to the above three types of picture, there are two more defined types, which are Skipped picture and BI-picture. The Skipped picture is a picture which does not include any pixel data, and treated as a P-picture having the same pixel data of a prior reference picture in decoding order. For example, in examples of (1) and (2), a picture S5 is regarded the same picture as a picture P3, so that the same operation of decoding the stream is performed in both (1) and (2).
(1) Display order: Picture I0, Picture B2, Picture P1, Picture B4, Picture P3, Picture B6, Picture S5 (Note that the picture represented by a symbol including I is an I-picture, the picture represented by a symbol including P is a P-picture, the picture represented by a symbol including B is a B-picture, and the picture represented by a symbol including S is a Skipped picture. For example, the picture S6 is a Skipped picture. The numerals attached to the symbols of the pictures represent decoding order.)
(2) Display order: Picture I0, Picture B2, Picture P1, Picture B4, Picture P3, Picture B6, Picture P5 (P5 has the same pixel data as P3.)
The Skipped picture is especially useful when pictures are still. For example, in a case where the pictures are still in the middle of the RAU, Skipped pictures are used where the pictures are still, for example, where there are picture I0, picture P1, picture P2, picture P3, picture S4, picture S5, picture S6 . . . , in order to reduce an amount of data to be coded.
Furthermore, BI-picture is a picture having characteristics of the B-picture and I-picture. More specifically, the BI-picture has the B-picture characteristics in which decoding order is different from display order, and the picture is not a reference picture for other pictures. In addition, the BI-picture has the I-picture characteristics in which all macroblocks are applied with an intra-picture coding and the picture is not predicted from any other pictures.
Next, a method for distinguishing the I-picture, P-picture, B-picture, Skipped picture, and BI-pictures is described. Basically, the types of pictures can be distinguished based on the picture types included in a picture layer in a stream. However, the picture types indicated by the picture layer are defined as following, depending on profiles.
For example, in a simple profile, picture types are indicated as I-picture and P-picture. In a main profile, picture types are indicated as I-picture, P-picture, and B- or BI-picture. In an advanced profile, picture types are indicated as I-picture, P-picture, B-picture, BI-picture, and Skipped picture.
Here, in both of the simple profile and the main profile, it is impossible to distinguish the Skipped picture by using the picture types in the picture layer, so that, in a case where an arbitrary picture has a size of one or less byte, the picture is defined as the Skipped picture. Furthermore, in the main profile, one picture type is defined to represent B-picture or BI-picture, so that it is impossible to distinguish B-picture from BI-picture, based on the picture type.
FIG. 4 is a block diagram showing a picture coding apparatus for realizing the conventional image coding method.
A picture coding apparatus 800 performs compressed coding, variable length coding, and the like, for an inputted picture signal Vin, thereby transforming the picture signal Vin into a bitstream (stream) Str to be outputted. The picture coding apparatus 800 is comprised of a motion estimation unit 801, a motion compensation unit 802, a subtractor 803, an orthogonal transformation unit 804, a quantization unit 805, an inverse quantization unit 806, an inverse orthogonal transformation unit 807, an adder 808, a picture memory 809, a switch 810, a variable length coding unit 811, and a prediction structure determination unit 812.
The picture signal Vin is inputted into the subtractor 803 and the motion estimation unit 801. The subtractor 803 calculates a differential between the inputted picture signal Vin and a predicted picture, and outputs the differential to the orthogonal transformation unit 804. The orthogonal transformation unit 804 transforms the differential into a frequency coefficient, and outputs the frequency coefficient into the quantization unit 805. The quantization unit 805 quantizes the inputted frequency coefficient, and outputs the resulting quantization value Qc into the variable length coding unit 811.
The inverse quantization unit 806 inversely quantizes the quantization value Qc in order to restore the original frequency coefficient, and outputs the resulting frequency coefficient to the inverse orthogonal transformation unit 807. The inverse orthogonal transformation unit 807 performs inverse-frequency transformation on the frequency coefficient to be transformed into a pixel differential, and outputs the pixel differential to the adder 808. The adder 808 adds the pixel differential with a predicted picture which is outputted from the motion compensation unit 802, and generates a decoded picture. The switch 810 is On when the decoded picture is instructed to be stored, and the decoded picture is stored into the picture memory 809.
On the other hand, the motion estimation unit 801, in which the picture signal Vin is inputted in units of macroblocks, searches the decoded pictures (reference pictures) which are stored in the picture memory 809, detects an image having the most similar image to a macroblock indicated by the picture signal Vin, and determines a motion vector MV for indicating a location of the image.
The motion compensation unit 802, by using the determined motion vector and the like, retrieves the most suitable image for a predicted picture, from the decoded picture stored in the picture memory 809.
A prediction structure determination unit 812 determines, based on a RAU start picture Uin, that a picture to be coded is at a RAU start location, then instructs, using a picture type Pt, the motion estimation unit 801 and the motion compensation unit 802 to code (inter-picture coding) the picture as a special randomly-accessible picture, and further instructs the variable length coding unit 811 to code the picture type Pt.
The variable length coding unit 811 performs variable length coding on the quantization value Qc, the picture type Pt, and the motion vector MV in order to generate a stream Str.
FIG. 5 is a block diagram showing a picture decoding apparatus 900 for realizing the conventional image decoding method. The reference numerals in FIG. 4 are assigned to identical units in FIG. 5, and the those units operate in the same manner as described for the picture coding apparatus for realizing the conventional image coding method in FIG. 4, so that the details of those units are not described herein below.
The variable length decoding unit 901 decodes the stream Str, and outputs the quantization value Qc, a reference picture specification information Ind, the picture type Pt, the motion vector MV, and the like. The picture memory 809 obtains the movement vector MV, the motion compensation unit 802 obtains the picture type Pt, the movement vector MV, and the reference picture specification information Ind, and the inverse quantization unit 806 obtains the quantization value Qc. The decoding is performed by the picture memory 809, the motion compensation unit 802, and the inverse quantization unit 806, the inverse orthogonal transformation unit 807, and the adder 808. The operation of the decoding has been described with reference to the block diagram of FIG. 4 showing the picture coding apparatus 800 for realizing the conventional coding method.
A buffer memory 902 is a memory for storing a decoded picture Vout which is outputted from the adder 808, and a display unit 903 obtains the decoded picture Vout from the buffer memory 902 and displays a picture according to the decoded picture Vout. Note that the buffer memory 809 and the picture memory 902 can share the same memory.
FIG. 6 is a flowchart showing decoding during special play-back, such as high-speed play-back, performed by the conventional picture decoding apparatus 900. Firstly, the conventional picture decoding apparatus 900 detects, from the stream Str, a header of a picture to be decoded at Step S1001. Then at Step 1002, the conventional picture decoding apparatus 900 examines, based on a picture type in the header included in the picture layer, whether or not the starting picture needs to be decoded. At Step S1003, the conventional picture decoding apparatus 900 determines whether or not the picture is examined to be decoded at Step 1002, and if the decoding needs to be decoded, then the processing proceeds to Step S1004, while if the picture does not need to be decoded, then the processing proceeds to Step S1005. Finally, at Step S1005, the conventional picture decoding apparatus 900 determines whether or not the processing completes even for a last picture to be play-backed, such as a last picture in a RAU or a stream, and if there are still pictures to be processed, the processing repeats the steps from Step S1001 to S1005, and if the last picture is processed, the processing completes.
However, in the above conventional picture coding apparatus 800 and picture decoding apparatus 900, there is a problem of a large amount of processing load, during coding the stream Str which includes Skipped pictures, and especially during the special play-back such as high-speed play-back.
FIG. 7 is an explanatory diagram showing the problem in the above conventional picture coding apparatus 800 and picture decoding apparatus 900.
In (a) of FIG. 7, a structure of the conventional RAU including the Skipped pictures is shown. The RAU is comprised of twenty-four pictures in which the images are still in the fourth and following pictures in decoding order, so that the fifth and later pictures are all Skipped pictures. When such a RAU is play-backed at triple speed, the conventional picture decoding apparatus 900 attempts to decode the 1st, 4th, 7th, 10th, 13th, 16th, 19th and 22nd pictures, sequentially to be play-backed. However, pictures to be practically decoded are only first I-picture and the fourth P-picture as shown in (c) of FIG. 7.
This means that, in a RAU in the conventional stream Str, the picture decoding apparatus 900 cannot determine whether or not the pictures are to be decoded, unless a head of each picture (picture layer) is searched to obtain a picture type, since each picture layer includes a picture type of the picture. Therefore, as shown in (b) of FIG. 7, the picture decoding apparatus 900 needs to analyze the 7th, 10th, 13th, 16th, 19th and 22nd Skipped pictures to obtain the picture types.
As described above, for the high-speed play-back of the conventional RAU, the conventional picture coding apparatus and picture decoding apparatus need to analyze even pictures which do not need to be decoded, which eventually results in a large amount of data for decoding.
Thus, the present invention addresses the above problems and an object of the present invention is to provide a picture coding apparatus and a picture decoding apparatus which can reduce load in decoding.