Recently, with an arrival of the age of multimedia which handles integrally audio, video and pixel values of others, existing information media, i.e., newspapers, journals, TVs, radios and telephones and other means through which information is conveyed to people, has come under the scope of multimedia.
Generally speaking, multimedia refers to a representation in which not only characters but also graphics, audio and especially pictures and the like are related to each other. However, in order to include the aforementioned existing information media in the scope of multimedia, it appears as a prerequisite to represent such information in digital form.
However, when calculating the amount of information contained in each of the aforementioned information media as the amount of digital information, the information amount per character requires 1-2 bytes whereas the audio requires more than 64 Kbits (telephone quality) per second and when it comes to the moving picture, it requires more than 100 Mbits (present television reception quality) per second. Therefore, it is not realistic to handle the vast information directly in digital form via the information media mentioned above. For example, a videophone has already been put into practical use via Integrated Services Digital Network (ISDN) with a transmission rate of 64 Kbit/s-1.5 Mbit/s, however, it is impossible to transmit video captured on the TV screen or shot by a TV camera.
This therefore requires information compression techniques, and for instance, in the case of the videophone, video compression techniques compliant with H.261 and H.263 standards internationally standardized by International Telecommunication Union-Telecommunication Standardization Sector (ITU-T) are employed. According to the information compression techniques compliant with the MPEG-1 standard, picture information as well as audio information can be stored in an ordinary music CD (Compact Disc).
The Moving Picture Experts Group (MPEG) is an international standard for a compression of moving picture signals and MPEG-1 is a standard that compresses video signals down to 1.5 Mbit/s, namely, to compress the information included in TV signals approximately down to a hundredth. The quality targeted in the MPEG-1 standard was the medium quality so as to realize a transmission rate primarily of about 1.5 Mbit/s, therefore, MPEG-2, standardized with the view to meet the requirements of high-quality picture realizes a TV broadcast quality for transmitting moving picture signals with a transmission rate of 2-15 Mbit/s. In the present circumstances, a working group (ISO/IEC JTC1/SC29/WG11) previously in the charge of the standardization of the MPEG-1 and the MPEG-2 has standardized MPEG-4 which achieves a compression rate superior to the one achieved by the MPEG-1 and the MPEG-2, allows coding/decoding operations on a per-object basis and realizes a new function required by the era of multi media. At first, in the process of the standardization of the MPEG-4, the standardization of a coding method for a low bit rate was aimed, however, the aim is presently extended to a more versatile coding including a coding of moving pictures at a high bit rate and a coding of interlace pictures. Moreover, a standardization of MPEG-4 AVC and ITU H.264 is in process as a next generation coding method with a higher compression rate, jointly worked by the ITU-T and the ISO/IEC. The next generation coding method is published under the name of Committee Draft (CD) as of August 2002.
In coding of a moving picture, compression of information volume is usually performed by eliminating redundancy both in spatial and temporal directions. Therefore, inter-picture prediction coding, which aims at reducing the temporal redundancy, estimates a motion and generates a predictive picture on a block-by-block basis with reference to forward and backward pictures, and then codes a differential value between the obtained predictive picture and a current picture to be coded. Here, “picture” is a term to represent a single screen and it represents a frame when used for a progressive picture whereas it represents a frame or a field when used for an interlaced picture. The interlaced picture here is a picture in which a single frame consists of two fields having different time. For coding and decoding an interlaced picture, three ways are possible: handling a single frame either as a frame, as two fields or as a frame structure or a field structure depending on a block in the frame.
A picture to which intra-picture prediction coding is performed without reference pictures is called I-picture. A picture to which inter-picture prediction coding is performed with reference to a single picture is called P-picture. A picture to which inter-picture prediction coding is performed by referring simultaneously to two pictures is called B-picture. Two pictures whose display time is either forward or backward to that of a current picture to be coded can be selected arbitrarily as reference for coding B-picture. The reference pictures can be specified for each block which is a basic unit for coding and decoding, but they can be classified as the first reference picture for a reference picture that is described first in a coded bit stream and a picture that is described later as the second reference picture. However, the reference pictures need to be already coded or decoded as a condition to code or decode these I, P and B pictures.
A motion compensation inter-picture prediction coding is employed for coding P-pictures or B-pictures. The motion compensation inter-picture prediction coding is a coding method applying motion compensation to inter-picture prediction coding. The motion compensation is not a method to simply predict motions using pixels in the reference pictures but to estimate a motion (to be referred to as “motion vector” hereinafter) at each part within a picture and improve prediction accuracy by performing prediction that takes the motion vector into consideration as well as to reduce the data amount. For example, the amount of data is reduced by estimating the motion vector for a current picture to be coded and coding prediction error between a predictive value, which is obtained after being shifted for the amount equivalent to the motion vector, and the current picture. In the case of using this method, information on motion vectors is required at the time of decoding, therefore, the motion vectors are coded and then recorded or transmitted.
A motion vector is estimated on a block-by-block basis. More precisely, a motion vector is estimated by fixing a block in the current picture, then, shifting a block in the reference picture within a range of search, and finding out a location of the reference block that resembles a basic block.
FIG. 1 is a block diagram showing the structure of the conventional picture coding apparatus.
A picture coding apparatus 900 outputs a coded image signal (to be referred to as “bit stream”) Str9 which is a bit stream obtained by coding an image signal Vin on a picture-by-picture basis, and includes a motion estimation unit 903, a motion compensation unit 905, a subtractor 906, an orthogonal transformation unit 907, a quantization unit 908, an inverse quantization unit 910, an inverse orthogonal transformation unit 911, an adder 912, a picture memory 904, a switch 913, a variable length coding unit 909 and an access point determination unit 902. Each component such as the motion estimation unit 903 executes the following processing per block or per macroblock that constitutes a picture.
The subtractor 906 calculates a differential value between the image signal Vin and a predictive image Pre and outputs the differential value to the orthogonal transformation unit 907. The orthogonal transformation unit 907 transforms the differential value into frequency coefficients and outputs them to the quantization unit 908. The quantization unit 908 quantizes the frequency coefficients and outputs the quantized values to the variable length coding unit 909. The inverse quantization unit 910 restores the frequency coefficients by inversely quantizes the quantized values and outputs the frequency coefficients to the inverse orthogonal transformation unit 911.
The inverse orthogonal transformation unit 911 performs inverse frequency transformation on the frequency coefficients outputted from the inverse quantization unit 910 into pixel differential values and outputs them to the adder 912. The adder 912 adds the pixel differential values outputted from the inverse orthogonal transformation unit 911 and the predictive image Pre outputted from the motion compensation unit 905, and generates a decoded image. The switch 913 connects the adder 912 and the picture memory 904 so that the picture memory 904 stores the decoded image generated by the adder 912. The decoded image stored in the picture memory is simply referred to as “picture” hereinafter.
The motion estimation unit 903 refers to the picture stored in the picture memory 904 as a reference picture and specifies an image area that resembles the image signal Vin the most among the reference pictures. Then, the motion estimation unit 903 estimates a motion vector MV indicating a position of the image area.
The motion estimation unit 903 also identifies a reference picture that resembles the image signal Vin out of the plural reference pictures using identification numbers (relative index Idx) for identifying the reference picture.
The motion compensation unit 905 extracts an image area that is the most applicable to the predictive image Pre from among the pictures stored in the picture memory 904 using the motion vector MV and the relative index Idx. The motion compensation unit 905 then generates a predictive image Pre from the extracted image area.
The access point determination unit 902 instructs the motion estimation unit 903 and the motion compensation unit 905 to code (intra-picture code) per predetermined unit (random access unit) a predetermined picture as a special picture. The special picture here means a picture from which the decoding can be started in the stream Str 9. Furthermore, the access point determination unit 902 outputs an access point identifier rapp indicating that a picture is the special picture to the variable length coding unit 909.
The variable length coding unit 909 codes a parameter set PS obtained from outside resources, the motion vector MV, the quantized values, the relative index Inx and the access point identifier rapp, generates a stream Str9 in which the coded parameter set is placed only at the head side, and outputs the stream Str9.
FIG. 2 is a structural diagram showing the structure of the stream Str9 outputted by the conventional picture coding apparatus 900.
The stream Str9 includes sequentially from the head a synchronous signal sync, a parameter set PS, plural random access units RAU9. Such stream Str9 complies with the JVT (H.264/MPEG-4 AVC) which is presently in process of standardization, jointly worked by the ITU-T and the ISO/IEC.
The parameter set PS is common data equivalent to a header and includes a picture parameter set PPS is equivalent to a header of the picture, a sequence parameter set SPS equivalent to a header of a unit with a level superior to a random access unit RAU9. The sequence parameter set SPS includes a maximum possible number of reference pictures, a picture size, or the like, whereas the picture parameter set PPS includes a type of variable length coding (a switching between Huffman coding and arithmetic coding), an initial value in the quantization step, the number of reference pictures, or the like.
The random access unit RAU9 includes sequentially from the head a synchronous signal sync and a plurality of coded pictures pic. The random access unit RAU9 as such is a single unit including the plural pictures in the stream Str9 and includes the special picture as mentioned above which can be decoded without depending on other pictures. Namely, the random access unit RAU9 is obtained by dividing the stream Str9 into a group of plural pictures including a special picture.
The picture pic includes sequentially from the head a synchronous signal sync and a parameter set identifier PSID and plural pieces of pixel data pix.
The parameter set identifier PSID indicates the sequence parameter set SPS and the picture parameter set PPS, which are included in the parameter set PS, to be referred to by the picture pic.
The synchronous signal sync included in the head of the stream Str9, in the head of the random access unit RAU9 and in the head of the picture pic indicates respectively a section distinguishing the units such as the stream Str9, the random access unit RAU9 and the picture pic.
Namely, in the picture coding method in which the conventional picture coding apparatus 900 generates a stream Str9 by coding the image signal Vin, a stream Str9 is generated in such a way that the parameter set PS is coded together and then placed at the head side of the stream Str9, whereas plural random access units RAU9, each of which does not include a picture parameter set PPS and a sequence parameter set SPS, follows the parameter set PS.
When decoding such stream Str9, the picture decoding apparatus refers to the sequence parameter set SPS and the picture parameter set PPS included in the parameter set PS indicated by the parameter set identifier PSID in the picture pic so as to decode the picture pic.
A conventional stream according to MPEG-2 has a structure different from the stream Str9.
FIG. 3 is a structural diagram showing the structure of the conventional stream according to the MPEG-2.
A stream Str8 according to the MPEG-2 includes sequentially from the head a synchronous signal sync, a header hed that is common data in the stream Str8 and a plurality of groups of pictures GOP.
The group of picture GOP includes sequentially from the head a synchronous signal sync, a header hed that is common data for the group of pictures GOP and plural coded pictures pic.
The group of picture GOP as such is a basic unit for coding and is used for editing a moving picture and performing random access. The picture pic included in the group of picture GOP is either I-picture, P-picture or B-picture.
The picture pic includes sequentially from the head a synchronous signal sync, a header that is data common to the pictures pic and plural pieces of pixel data pix.
Namely, in the picture coding method according to the MPEG-2 for generating a stream Str8 by coding the conventional image signal Vin, a stream Str8 is generated in such a way that the header hed necessary for decoding the picture pic is included respectively in the heads of the stream Str8, each group of picture GOP and each picture pic.
However, a problem is that the picture decoding apparatus cannot decode from a random access point that is a head of the random access unit RAU9 in the stream Str9 (i.e. random access) in an attempt to start performing random access, for example, in the case where the parameter set PS cannot be obtained since the stream Str9 is read out from the middle, as the parameter set PS is placed in one place at the head side of the stream Str9 according to the conventional picture coding method as described above, employed by the picture coding apparatus 900. Namely, the picture decoding apparatus cannot decode the picture pic properly because the corresponding picture parameter set PPS and sequence parameter set SPS are not found.
More precisely, the stream Str9 cannot be decoded from the middle when the picture decoding apparatus has read in the stream Str9 from the middle under the circumstance where the stream is incessantly transmitted as in a case of broadcasting or delivery.
In the case where the stream Str9 is recorded on a recording medium such as a tape or a disk, the picture decoding apparatus firstly has to read the parameter set PS placed in the head of the stream Str9 on the recording medium and then start reading the stream Str9 from the random access point in the attempt to start performing random access for the stream Str9. That is to say that the picture decoding apparatus has to shift the position of reading the data from the head of the stream Str9 to the random access point, and thereby, a prompt random access cannot be operated since the shifting time becomes a waiting time for the random access.
In the case where the recording medium is a tape, it is apparent that the waiting time is very long, and even if the disk is capable of high-speed reading, the waiting time may be prolonged to several seconds, which cannot be ignored.
The picture decoding apparatus can perform random access for each group of picture GOP on the stream Str8 generated using the picture coding method according to the MPEG-2 by using the header hed in the group of picture GOP and the header hed in each picture pic.
With the use of the picture coding method for generating such stream Str8, however, the compression rate of the stream Str8 is low since each of the pictures pic included in the group of picture GOP has a header hed and many of such headers hed have the same value as other headers hed. That is to say, the picture coding method for generating a stream Str8 allows a generation of a stream Str8 to which random access can be performed, on one hand, however, decreases the coding efficiency on the other.