Recently, with an arrival of the age of multimedia which handles integrally audio, video and pixel values of others, existing information media, i.e., newspapers, journals, TVs, radios and telephones and other means through which information is conveyed to people, has come under the scope of multimedia. Generally speaking, multimedia refers to something that is represented by associating not only with characters but also with graphics, audio and especially pictures and the like together. However, in order to include the aforementioned existing information media in the scope of multimedia, it appears as a prerequisite to represent such information in digital form.
However, when calculating the amount of information contained in each of the aforementioned information media as the amount of digital information, the information amount per character requires 1˜2 bytes whereas the audio requires more than 64 Kbits (telephone quality) per second and when it comes to the moving picture, it requires more than 100 Mbits (present television reception quality) per second. Therefore, it is not realistic to handle the vast information directly in the digital format via the information media mentioned above. For example, a videophone has already been put into practical use via Integrated Services Digital Network (ISDN) with a transmission rate of 64 Kbit/s˜1.5 Mbit/s. However, it is not practical to transmit video captured on the TV screen or shot by a TV camera. This therefore requires information compression techniques, and for instance, in the case of the videophone, video compression techniques compliant with H.261 and H.263 standards internationally standardized by ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) to be employed. According to information compression techniques compliant with the MPEG-1 standard, picture information as well as music information can be stored in an ordinary music CD (Compact Disc).
Here, MPEG (Moving Picture Experts Group) is an international standard for compression of moving picture signals and MPEG-1 is a standard that compresses video signals down to 1.5 Mbit/s, that is, to compress information of TV signals approximately down to a hundredth of their original size. The transmission rate within the scope of the MPEG-1 standard is limited primarily to about 1.5 Mbit/s, therefore, MPEG-2 which was standardized with the view to meet the requirements of high-quality picture allows data transmission of moving picture signals at a rate of 2˜15 Mbit/s. In the present circumstances, a working group (ISO/IEC JTC1/SC29/WG11) in charge of the standardization of the MPEG-1 and the MPEG-2 has achieved a compression rate which goes beyond what the MPEG-1 and the MPEG-2 have achieved, realized encoding/decoding operations on a per-object basis and standardized MPEG-4 in order to realize a new function required by the era of multimedia. In the process of the standardization of the MPEG-4, the standardization of an encoding method for a low bit rate was aimed. However, the aim is presently extended to a more versatile encoding of moving pictures at a high bit rate including interlaced pictures.
Recently, a new picture encoding as a next generation encoding of the MPEG-4 called JVC is under the process of standardization and being jointly worked on by the ITU-T and the ISO/IEC.
FIG. 24 is a diagram showing a prediction structure, a decoding order and a display order of pictures. “Picture” is a term indicating either a frame or a field and the term “picture” here is used instead of frame or field in the present specification. The hatched pictures in FIG. 24 represent the pictures to be stored in the memory for reference when other pictures are encoded/decoded.
I0 is an intra coded picture and P3, P6 and P9 are predictive coded pictures (P-picture). The predictive encoding in the scheme of the JVT standard differs from that of the conventional MPEG-1/2/4. An arbitrary picture is selected out of a plurality of encoded pictures as a reference picture and a predictive image can be generated from the reference picture. For example, a picture P9 may select an arbitrary picture out of three pictures of I0, P3 and P6 and generate a predictive image using the selected picture. Consequently, it heightens a possibility to select the more applicable predictive image than the conventional case of applying MPEG-1/2/4 and thereby improves a compression rate, B1, B2, B4, B5, B7 and B8 are bi-directionally predictive coded pictures (B-picture), differing from inter-picture prediction, wherein a plurality of pictures (two pictures) are selected and a predictive image is generated using the selected pictures and then encoded. It is especially known that the accuracy of the predictive image can be greatly improved and so can be the compression rate by performing interpolation prediction using an average value of two pictures temporally previous and subsequent for generating a predictive image. The marks of “I” for an intra coded picture, “P” for a predictive coded picture and “B” for a bi-directionally predictive coded picture are used in order to differentiate encoding method of each picture.
In order to refer to the temporally previous and subsequent pictures for the B-pictures, the temporally previous pictures shall be coded/decoded at first. This is called reordering of pictures and often takes place in the conventional MPEG-1/2/4. Therefore, in contrast with an encoding order (Stream Order), an order of displaying the pictures which are decoded (Display Order) is reordered as shown in FIG. 24 showing a prediction structure, a decoding order and a display order of pictures. B-pictures in the example of FIG. 24 are displayed at the moment when the stream is decoded, therefore, there is no need to store them when they are not referred to by other pictures. However, I-pictures and P-pictures have to be stored in a memory since they are displayed after being decoded when the decoding of the following B-picture is terminated.
The terms and the meanings of the hatched pictures in the diagram showing the prediction structure, the decoding order and the display order of the pictures are the same as those used in FIG. 24.
FIG. 26 is a block diagram showing a picture encoding apparatus for realizing a conventional picture encoding method. The following illustrates an operation of the picture encoding apparatus for realizing the conventional picture encoding method in FIG. 26.
A picture structure determination unit PicStruct determines an encoding type (I-picture, P-picture and B-picture) for each picture, notifies a reference picture control unit RefPicCtrl of the encoding type and the pictures that can be referred to in the encoding and informs also a reordering unit ReOrder of the encoding order of the pictures. The reordering unit ReOrder reorders the order of an input picture PicIn into an encoding order and outputs the reordered pictures to a motion estimation unit ME and a subtraction unit Sub. The motion estimation unit ME refers to the reference pictures stored in a picture memory PicMem1, determines an applicable reference picture and detects a motion vector indicating a pixel position of the reference picture and sends them to a variable length coding unit VLC, the picture memory PicMem1 and a motion compensation unit MC. The picture memory PicMem1 outputs the pixels of the reference picture according to the motion vector MV to the motion compensation unit MC whereas the motion compensation unit MC generates a predictive image using the pixels in the reference picture gained from the picture memory PicMem1 and the motion vector MV.
The subtraction unit Sub calculates a difference between the picture reordered by the reordering unit ReOrder and the predictive image. The difference is converted to frequency coefficients by an orthogonal transformation unit T and then the frequency coefficients are quantized by the quantization unit Q and outputted as quantized values Coef.
An inverse quantization unit IQ inverse quantizes the quantized values Coef and restores them as frequency coefficients. The inverse orthogonal transformation unit IT performs inverse frequency conversion for the frequency coefficients to be outputted as pixel differential values. An addition unit Add adds the predictive image to the pixel differential values and obtains a decoded picture.
The reference picture control unit RefPicCtrl, according to the encoding type of the picture, judges whether or not the decoded picture is to be stored in the picture memory PicMem1 to be referred to as a reference picture and whether or not the decoded picture is to be removed from the picture memory PicMem1 (no longer referred to as a reference picture) and notifies of the operation using a memory control command MMCO.
A switch SW is turned ON when the memory control command MMCO orders a storage and thereby the decoded picture is stored in the picture memory PicMem1 as a reference picture. The picture memory PicMem1 releases the area where the decoded picture is stored so that other decoded pictures can be stored when the picture memory PicMem1 instructs that the decoded picture shall be removed from the picture memory PicMem1.
The variable length coding unit VLC encodes the quantized values Coef, the motion vector MV and the memory control command MMCO and outputs an encoded stream Str.
The case in which the encoding includes the frequency conversion and the quantization is shown. However, the encoding may be one without them, such as DPCM, ADPCM, and linear predictive encoding. The encoding may be one in which the frequency conversion and the quantization are integrated or one that is not accompanied by the quantization after the frequency conversion, as in bit-plane encoding.
FIG. 27 shows bit streams of the memory control command MMCO. The variable length coding unit VLC encodes “000” which means a release of a whole memory area so that the picture memory is initialized at the beginning of the encoding/decoding or in the head of the GOP (Group Of Picture). Also, the variable length coding unit VLC encodes “01” when the decoded picture is stored in the picture memory. When a picture stored in the picture memory is released at the same time, the variable length coding unit VLC encodes a picture number following the “001” since the picture number to be released has to be indicated. When a plurality of pictures are released, the command to release a picture needs to be encoded for a plural number of times, therefore, a command to store a picture is encoded in addition to the command to release a picture. The variable length coding unit VLC encodes sequentially a plurality of memory control commands MMCO and encodes lastly “1” indicating that the memory control command MMCO is complete. In this way, the memory control command MMCO is encoded as an encoded stream Str.
FIG. 28 is a block diagram showing a picture decoding apparatus for realizing a conventional picture decoding method. The same numbers are used for the devices that operate in the same manner as the picture encoding apparatus for realizing the conventional picture encoding method shown in FIG. 26.
A variable length decoding unit VLD decodes an encoded stream Str and outputs a memory control command MMCO, a motion vector MV and quantized values Coef. A picture time Time is inputted from outside and is a signal for specifying a picture to be displayed. When a picture to be displayed is a decoded picture, an output from the adding unit Add is selected at a selector Sel and sent out to a display unit Disp. When a picture to be displayed is a picture stored in the picture memory PicMem1, it is read out from the picture memory PicMem1, selected at the selector Sel and outputted to a display unit Disp.
As described above, the picture memory PicMem1 outputs, to the motion compensation unit MC, pixels according to the motion vector MV whereas the motion compensation unit MC generates a predictive image according to the pixels obtained from the picture memory PicMem1 together with the motion vector MV.
The inverse quantization unit IQ inverse quantizes the quantized values Coef and restores them as frequency coefficients. Furthermore, the inverse orthogonal transformation IT performs inverse frequency conversion for the frequency coefficients to be outputted as pixel differential values. The addition unit Add adds the predictive image to the pixel differential values to generate a decoded picture.
The picture memory PicMem1 releases the area in which the decoded picture is stored so that another decoded picture can be stored.
The example of the decoding including the inverse frequency conversion and the inverse quantization is described above. However, the decoding may be one without them, such as DPCM, ADPCM and a linear predictive encoding. The decoding may be one in which the inverse frequency conversion and the inverse quantization are integrated or one that is not accompanied by the inverse quantization after the frequency conversion as in a bit-plane encoding.
With the use of the picture decoding apparatus for realizing the conventional picture decoding method shown in FIG. 28, it is obvious that the combination of the conventional picture encoding types shown in FIGS. 24 and 25 allows for a correct decoding of the encoded stream Str encoded by the picture encoding apparatus for realizing the conventional picture encoding method shown in FIG. 26.
The more flexible combination is considered here as a picture encoding type.
FIG. 1 is a diagram showing a prediction structure, a decoding order and a display order of the pictures, which do not exist in the related art. The prediction structure with respect to B-picture differs in the vicinity of Picture 4 in FIG. 1. Namely, Picture 2 that is a B-picture is stored in the picture memory to be referred to as a predictive image of Picture 1 and Picture 3. Consequently, the encoding order and the display order of each picture are as shown in FIG. 1.
Pictures B5 and B6 are B-pictures that are not stored since they are not referred to in a predictive coding. However, differing from FIG. 24, the display time for the pictures B5 and B6 has not yet come at the time when they are decoded since it is the time for other picture to be displayed. That is, at the time of decoding the picture B5, the picture P4 shall be displayed and at the time of decoding the picture B6, the picture B5 shall be displayed. Since the pictures B5 and B6 are not stored, they cannot be taken out from the picture memory at the display time. Therefore, the pictures which are not referred to for predictive encoding are not stored in the picture memory, therefore, the pictures B5 and B6 cannot be displayed after being decoded with the use of the conventional encoding/decoding method. Namely in the case of not storing the pictures that are not referred to in predictive encoding as in the example shown in FIG. 24, only Pictures 1, 2, 4, and 7 can be displayed.
Thus, considering the more flexible combination as a picture encoding type, it is a problem that the pictures which cannot be displayed after being decoded occurs. It is conceivable to add another picture memory for display and store the pictures that are not stored in the picture memory PicMem1 in this picture memory for display so that they can be displayed; however, the weak point is that this picture memory requires a huge memory in this case.
Furthermore, there rises a new problem in the reproduction of a picture in the middle of the stream even if another picture memory for display is introduced. FIG. 2 is a diagram showing a prediction structure, a decoding order and a display order of pictures. The difference comparing with FIG. 25 is that the prediction structure in the vicinity of Picture 7 becomes completely independent. The pictures following a picture I7 are not referred to when the pictures with display time preceding the picture I7 are encoded/decoded. Therefore, the pictures following the picture I7 can be encoded correctly if the decoding starts from the picture I7 and the picture I7 can be reproduced independently. In this way, the insertion of an I-picture while streaming often takes place. This system to reproduce a picture in the middle of the stream, which complies with MPEG-2, is called GOP (Group Of Picture).
The correspondence of a reproduced picture of the picture decoding apparatus and that of the picture encoding apparatus in the case of reproducing the picture in the middle of the stream has to be assured, and the easy method is to initialize the whole area of the picture memory. However, Picture 6 is not yet displayed and stored in the picture memory when Picture 7 is decoded, Picture 6 therefore cannot be displayed from the picture memory at its display time if the entire picture memory is initialized before the display of Picture 6 takes place.
The object of the present invention therefore is to allow the display of the pictures that cannot be displayed after being decoded by taking the memory amount necessary for encoding/decoding of the picture into consideration.