In recent years, devices that digitally process image information with an aim of efficient transmission and storage of information and that are compliant with an MPEG (Moving Picture Expert Group) standard or the like for compression by motion compensation and orthogonal transformation, such as a discrete cosine transform, using an image-information-specific redundancy are becoming widespread in both information distribution by broadcast stations and information reception by general homes.
In particular, MPEG-2 (ISO/IEC 13818-2) is a standard that was defined as a universal image compression scheme and that covers both interlaced scan images and progressive scan images as well as standard resolution images and high-definition images. MPEG-2 is widely used in a wide variety of applications for professional and consumer applications, for example, as represented by a DVD (Digital Versatile Disk) standard.
Through the use of the MPEG-2 compression scheme, for example, assigning a bit rate (bit rate) of 4 to 8 Mbps to a standard-resolution interlaced scan image having 720×480 pixels and a bit rate of 18 to 22 Mbps to a high-resolution interlaced scan image having 1920×1088 pixels can achieve favorable images with high compression ratios.
MPEG-2 was mainly intended for high-quality coding suitable for broadcasting, but was not compatible with a coding scheme employing a higher compression ratio and thus an MPEG-4 coding scheme has been standardized. With regard to an image coding scheme, the scheme was approved as an ISO/IEC 14496-2 international standard in December, 1998.
In addition, in recent years, with an initial purpose of image coding for videoconferences, the standardization of what is called H.26L (ITU/T Q6/16 VCEG) by ITU-T (International Telecommunication Union—Telecommunication Standardization Sector) is in progress. H.26L requires a larger amount of computation for encoding and decoding, compared to MPEG-2 and MPEG-4 coding schemes, but is known as achieving higher coding efficiency.
Further, as part of MPEG-4 activities, the standardization of a coding technique for realizing higher coding efficiency based on H.26L is currently underway by JVT (Joint Video Team) in cooperation with ITU-T.
Now, a description is given of image compression using motion compensation and orthogonal transformation, such as a discrete cosine transform or Karhunen-Loeve transform. FIG. 1 is a diagram showing the configuration of one example of a conventional image-information encoding device.
In an image-information encoding device 10 shown in FIG. 1, image information provided by an analog signal input from an input terminal 11 is converted by an A/D converter 12 into a digital signal. The screen rearranging buffer 13 rearranges frames in accordance with a GOP (Group of Pictures) structure of image information supplied from the A/D converter 12.
Here, with respect to an image on which intra (intra-image) encoding is performed, the screen rearranging buffer 13 supplies image information of an entire frame to an orthogonal transformation unit 15. The orthogonal transformation unit 15 performs a discrete cosine transform or Karhunen-Loeve transform on the image information and supplies a transform coefficient to a quantization unit 16. The quantization unit 16 performs quantization processing on the transform coefficient supplied from the orthogonal transformation unit 15.
Based on a quantization scale and a transform coefficient quantized and supplied by the quantization unit 16, a reversible encoding unit 17 determines an encoding mode, performs variable-length encoding or reversible encoding, such as arithmetic encoding, on the encoding mode to create information to be inserted into the header portion of an image encoding unit. The reversible encoding unit 17 then supplies the encoded encoding-mode to a storage buffer 18 for storage. The encoded encoding-mode is output from an output terminal 19 as compressed image information.
The reversible encoding unit 17 also performs variable-length encoding or reversible encoding, such as arithmetic encoding, on the quantized transform coefficient and supplies the encoded transform coefficient to the storage buffer 18 for storage. The encoded transform coefficient is output from the output terminal 19 as compressed image information.
The behavior of the quantization unit 16 is controlled by a rate controller 20 in accordance with the amount of data of transform coefficient stored in the storage buffer 18. The rate controller 20 also supplies a quantized transform coefficient to a dequantization unit 21. The dequantization unit 21 dequantizes the quantized transform coefficient. An inverse orthogonal transformation unit 22 performs inverse orthogonal transformation processing on the dequantized transform coefficient to generate decoded image information and supplies the information to a frame memory 23 for storage.
With respect to an image on which inter (inter-image) encoding is performed, the screen rearranging buffer 13 supplies image information to a motion prediction/compensation unit 24. The motion prediction/compensation unit 24 simultaneously retrieves image information that is referred to from the frame memory 23 and performs motion prediction/compensation processing on the image information to generate reference image information. The motion prediction/compensation unit 24 supplies the generated reference image information to an adder 14. The adder 14 converts the reference image information into a signal indicating a difference relative to corresponding image information. At the same time, the motion prediction/compensation unit 24 also supplies motion vector information to the reversible encoding unit 17.
The reversible encoding unit 17 determines an encoding mode from the quantization scale and the transform coefficient quantized and supplied by the quantization unit 16 and the motion vector information supplied from the motion prediction/compensation unit 24. The reversible encoding unit 17 performs variable-length encoding or reversible encoding, such as arithmetic encoding on the determined encoding mode to generate information to be inserted into the header portion of an image encoding unit. The reversible encoding unit 17 supplies the encoded encoding-mode to the storage buffer 18 for storage. The encoded encoding-mode is output as compressed image information.
The reversible encoding unit 17 performs variable-length encoding or reversible encoding processing, such as arithmetic encoding, on the motion vector information to generate information to be inserted into the header portion of an image encoding unit.
Unlike intra encoding, in the case of inter encoding, image information input to the orthogonal transformation unit 15 is a difference signal provided by the adder 14. Since other processing is analogous to that for the compressed image information on which intra encoding is performed, the description thereof is omitted.
Next, the configuration of one embodiment of an image-information decoding device corresponding to the above-described image-information encoding device 10 will be described with reference to FIG. 2. In an image-information decoding device 40 shown in FIG. 2, compressed image information input from an input terminal 41 is temporarily stored by a storage buffer 42 and is then transferred to a reversible decoding unit 43.
In accordance with a predetermined compressed-image-information format, the reversible decoding unit 43 performs processing, such as variable length decoding or arithmetic decoding, on the compressed image information. The reversible decoding unit 43 then obtains encoding mode information stored in a header portion and supplies the encoding mode information to a dequnatization unit 44. Similarly, the reversible decoding unit 43 obtains a quantized transform coefficient and supplies the coefficient to the dequnatization unit 44. When a frame to be decoded has been subjected to inter encoding, the reversible decoding unit 43 also decodes motion vector information stored in the header portion of compressed image information and supplies the information to a motion prediction/compensation unit 51.
The dequnatization unit 44 dequantizes the quantized transform coefficient supplied from the reversible decoding unit 43 and supplies the resulting transform coefficient to an inverse orthogonal transformation unit 45. In accordance with a predetermined compressed-image-information format, the inverse orthogonal transformation unit 45 performs inverse orthogonal transformation, such as inverse discrete cosine transform or Karhunen-Loeve transform, on the transform coefficient.
Here, when a frame of interest has been subjected to intra encoding, image information subjected to the inverse orthogonal transformation processing is stored in a screen rearranging buffer 47. After the image information is subjected to D/A conversion processing by a D/A converter 48, the resulting information is output from an output terminal 49.
Also, when a frame of interest has been subjected to inter encoding, the motion prediction/compensation unit 51 generates a reference image in accordance with motion vector information that has been subjected to reversible decoding processing and image information stored in a frame memory 50, and supplies the reference image to an adder 46. The adder 46 combines the reference image with the output from the inverse orthogonal transformation unit 45. Since other processing is analogous to that for the frame that has been subjected to intra encoding, the description thereof is omitted.
For a coding scheme (hereinafter referred to as “JVT Codec”) being standardized by the above-mentioned Joint Video Team, various schemes have been under consideration in order to improve the coding efficiency of MPEG-2, MPEG-4, and so on. For example, for the transformation scheme of a discrete cosine transform, an integer-coefficient transform with a 4×4 block size is used. Further, the block size for motion compensation is variable, so that more optimum motion compensation can be performed. The basic scheme, however, can be realized in the same manner as the encoding scheme performed in the image-information encoding device 10 shown in FIG. 1.
Thus, the JVT codec can perform decoding using essentially the same scheme as the decoding scheme performed in the image-information decoding device 40 shown in FIG. 2.
Meanwhile, in order to maintain the compatibility between different encoding devices (decoders) and to prevent buffer overflow or underflow, The MPEG and ITU-T use a buffer model. A virtual decoder buffer model is standardized and an encoding device (encoder) performs encoding so that the virtual decoder buffer does not fail. This makes it possible to prevent buffer overflow or underflow at the decoder side and to maintain the compatibility.
A virtual buffer model according to the MPEG will be described with reference to FIG. 3. In the following description, R indicates an input bit rate for a decoder buffer, B indicates the size of the decoder buffer, F indicates the amount of buffer occupied when the decoder extracts a first frame from the buffer, and D indicates delay time therefor.
Bit amounts of each frame at time t0, t1, t2, . . . are indicated by b0, b1, b2 . . . , and so on.
When the frame rate is M, the following expression is satisfied:ti+1−ti=1/M 
When Bi indicates the amount of buffer occupancy immediately before bit amount bi of a frame at time ti is extracted, expression (1) below is satisfied:B0=FBi+1=min(B,Bi−bi+R(ti+1−ti))  (1)
In this case, for a fixed bit rate encoding scheme for MPEG-2, the encoder must perform encoding so as to satisfy condition (2) below:Bi≤BBi−bi≥0  (2)
As long as such a condition is satisfied, the encoder should not perform encoding that causes buffer overflow and underflow.
Further, for a variable bit rate encoding scheme for MPEG-2, the input bit rate R is a maximum bit rate defined by a profile and a level and is given by F=B. Thus, expression (1) can be rewritten as expression (3)B0=BBi+1=min(B,Bi−bi+Rmax(ti+1−ti))  (3)
In this case, the encoder must perform encoding so as to satisfy expression (4) below:Bi−bi≥0  (4)
When this condition is satisfied, the encoder will perform encoding that does not cause buffer underflow at the decoder side. When the decoder buffer becomes full, the encoder buffer is empty and this indicates that no encoding bitstream is generated. Thus, there is no need for the encoder to perform monitoring so that the buffer overflow of the decoder does not occur.
In the MPEG, encoding is performed so as to comply with the above-described buffer restrictions in accordance with a buffer size and a bit rate defined by each profile and level. Thus, a decoder that conforms to each profile and level can perform decoding without causing failure of the bitstream.
In practice, however, without the use of a buffer size and a bit rate defined by a profile and a level, there are cases in that a bitstream can be decoded.
For example, a bitstream encoded with a bit rate R, a buffer B, and initial delay time F, i.e., (R, B, F), can be decoded by a decoder having a larger buffer size B′ (B′>B). The bitstream can also be decoded at a higher bit rate R′ (R′>R).
For example, when the decoding bit rate of a decoder is lower than an encoding bit rate, a decoder that has a sufficiently large buffer size can perform decoding.
In this manner, when a predetermined bitstream is given, a minimum buffer size Bmin needed to decode the bitstream exists at each bit rate. Such a relationship is shown in FIG. 4.
The standardization of JVT Codec has been in progress so that decoding is possible not only with a fixed bit rate and a buffer size defined by each profile and level but also is possibly by a decoder having the condition shown in FIG. 4. This has the objective of allowing decoding even if the decoding bit rate and buffer size of an encoder and the decoding bit rate and buffer size of a decoder are not necessarily the same. By achieving the objective, for example, a decoder having a high decoding bit rate can reduce a buffer size.
However, such information varies in a bitstream with time. Thus, there is a problem in that, even when decoding is possible under a predetermined condition, decoding may be impossible under another condition, since the restrictions for decoder compatibility has been relaxed. For example, when such a characteristic of (R, B) varies with time, there is a problem in that, even when decoding is possible at predetermined time, decoding may be impossible at another time.
Further, there is a problem in that decoding is not always possible in the case of shifting to another scene or another channel due to random access or the like. There is also a problem in that the decoding possibility cannot be guaranteed when bitstream-level editing, such as splicing (splicing), is performed.