Recently, with an arrival of the age of multimedia which handles integrally audio, video and pixel values, existing information media, i.e., newspaper, journal, TV, radio and telephone and other means through which information is conveyed to people, has come under the scope of multimedia. In general, multimedia refers to a representation in which not only characters but also graphic symbols, audio, and especially, pictures and the like, are related to each other. However, in order to include the aforementioned existing information media in the scope of multimedia, it appears as a prerequisite to represent such information in digital form.
However, when estimating the amount of information contained in each of the aforementioned information media in digital form, the information amount per character requires 1-2 bytes whereas audio requires more than 64 Kbits per second (telephone quality), and when it comes to a moving picture, it requires more than 100 Mbits per second (present television reception quality). Therefore, it is not realistic for the information media mentioned above to handle, in digital form, such an enormous amount of information as it is. For example, a videophone has already been put into practical use via Integrated Services Digital Network (ISDN) with a transmission rate of 64 Kbits/s to 1.5 Mbits/s, however, it is impossible to transmit pictures captured on the TV screen or shot by a TV camera directly through the ISDN.
This therefore requires information compression techniques, and for instance, in the case of a videophone, video compression techniques compliant with H.261 and H.263 standards internationally standardized by International Telecommunication Union-Telecommunication Standardization Sector (ITU-T) are employed. According to the information compression techniques compliant with MPEG-1 standard, picture information as well as audio information can be stored in an ordinary music Compact Disc (CD).
Here, the Moving Picture Experts Group (MPEG) is an international standard for a compression of moving picture signals and the MPEG-1 is a standard that compresses video signals down to 1.5 Mbits/s, namely, to compress the information included in TV signals approximately down to a hundredth. The quality targeted in the MPEG-1 standard was a medium one so as to realize a transmission rate primarily of about 1.5 Mbits/s, therefore, MPEG-2, standardized with the view to meet the requirements of even higher quality picture, realizes TV broadcast quality for transmitting a moving picture signal at a transmission rate of 2 to 15 Mbits/s.
In the present circumstances, a working group (ISO/IEC JTC1/SC29/WG11) previously in charge of the standardization of the MPEG-1/MPEG-2 has further standardized MPEG-4 which achieves a compression rate superior to the one achieved by the MPEG-1/MPEG-2, allows coding/decoding operations on a per-object basis and realizes a new function required by the age of multi media. At first, in the process of the standardization of the MPEG-4, the aim was to standardize a low bit rate coding, however, the aim is presently extended to a more versatile coding including a high bit rate coding for interlaced pictures and others. Moreover, a standardization of MPEG-4 AVC and ITU H.264, as a next generation coding method, is in process with a higher compression rate, jointly worked by the ITU-T and the ISO/IEC. The next generation coding method is published under the name of Committee Draft (CD) as of August 2002.
In coding of a moving picture, compression of information volume is usually performed by eliminating redundancy both in spatial and temporal directions. Therefore, inter-picture prediction coding, which aims at reducing the temporal redundancy, estimates a motion and generates a predictive picture on a block-by-block basis with reference to forward and backward picture(s), and then codes a differential value between the obtained predictive picture and a current picture to be coded. Here, “picture” is a term that signifies a picture on a screen, and represents a frame when used for a progressive picture whereas it represents a frame or a field when used for an interlaced picture. The interlaced picture here is a picture in which a single frame consists of two fields, each having a different capture time. For coding and decoding an interlaced picture, three ways of processing a single frame are possible: as a frame; as two fields; and as a frame/field structure depending on a block in the frame.
A picture to which intra-picture prediction coding is performed without reference pictures is called “I-picture”. A picture to which inter-picture prediction coding is performed with reference to a single picture is called “P-picture”. A picture to which inter-picture prediction coding is performed by simultaneously referring to two pictures is called “B-picture”. A B-picture can refer to two pictures, arbitrarily selected from forward or backward pictures in display order. The reference images (i.e. reference pictures) can be specified for each block serving as a basic coding/decoding unit. Distinction shall be made between such reference pictures by calling a reference picture to be described earlier in a coded bitstream as a first reference picture, and by calling a reference picture to be described later in the bitstream as a second reference picture. Note that pictures used for reference need to be already coded and decoded, as a condition for coding and decoding these types of pictures.
A motion compensated inter-picture prediction coding is employed for coding P-pictures or B-pictures. Coding by use of motion compensated inter picture prediction is a coding method that employs motion compensation in inter picture prediction coding. Unlike a method for performing prediction simply based on pixel values in a reference picture, motion compensation is a technique capable of improving prediction accuracy as well as reducing the amount of data by estimating the amount of motion (hereinafter referred to as “motion vector”) of each part within a picture and further by performing prediction in consideration of such amount of motion. For example, it is possible to reduce the amount of data through motion compensation by estimating motion vectors of the current picture to be coded and then by coding prediction residuals between prediction values obtained by shifting only the amount of the respective motion vectors and the current picture to be coded. In this technique, motion vectors are also recorded or transmitted in coded form, since motion vector information is required at the time of decoding.
Motion vectors are estimated on a per-macroblock basis. More specifically, a macroblock shall be previously fixed in the current picture to be coded, so as to estimate motion vectors by finding the position of the most similar reference block of such fixed macroblock within the search area in a reference picture.
FIG. 1 is a diagram illustrating an example data structure of a bitstream. As FIG. 1 shows, the bitstream has a hierarchical structure as follows. The bitstream (Stream) is made up of more than one group of pictures (GOP). By using GOPs as basic coding units, it becomes possible to edit a moving picture as well as to make a random access. Each GOP is made up of plural pictures, each being I picture, P picture, or B picture. Each picture is further made up of plural slices. Each slice, which is a strip-shaped area within each picture, is made up of plural macroblocks. Moreover, each stream, GOP, picture, and slice includes a synchronization signal (sync) for indicating an end point of each unit and a header (header) which is a piece of data common to such unit.
The header and data that is a part excluding the header may be transferred separately in the case of transmitting data not in a bitstream being a sequence of streams, but in a packet that is a unit of piecemeal data. In such case, the header and the data portion shall not be incorporated into the same bitstream, as shown in FIG. 1. In the case of packet, however, although a header and the corresponding data portion may not be transmitted sequentially, they are transferred simply in a different packet. Therefore, even in the case where the header and the data portion are not incorporated into the same bitstream, the same concept of bitstream as described with reference to FIG. 1 can be applied.
Generally speaking, the human visual system is characterized by its sensitivity to the low frequency components in a picture compared to the high frequency components. Furthermore, since the energy of the low frequency components in a picture signal is greater than that of the high frequency components, picture coding is performed in order from the low frequency components to the high frequency components. As a result, the number of bits required for coding the low frequency components is larger than that required for the high frequency components.
In view of the above points, the existing coding methods use larger quantization steps for the high frequency components than for the low frequency components when quantizing transformation coefficients, which are obtained by orthogonal transformation, of the respective frequencies. This technique has made it possible for the conventional coding method to achieve a large increase in compression ratio with negligible degradation in subjective quality in pictures.
Since the size of quantization steps of the high frequency components compared to that of the low frequency components depend on picture signal, a technique for changing the sizes of quantization steps for the respective frequency components on a picture-by-picture basis has been conventionally employed. A quantization matrix (also referred to as “weighting matrix”) is used to derive quantization steps of the respective frequency components. FIG. 2 shows an example of the quantization matrix. In this drawing, the upper left component is a direct current component, whereas rightward components are horizontal high frequency components and downward components are vertical high frequency components. The quantization matrix in FIG. 2 also indicates that the quantization steps get larger as the values become greater. Usually, it is possible to use different quantization matrices for each picture. The value indicating the size of a quantization step of each frequency component is fixed-length-coded. Note that it is usual that each component of a quantization matrix and the value of each quantization step are approximately proportional to each other, but it is not necessary to stick to such relationship as long as the correspondence between them is clearly defined.
FIG. 3 is a flowchart showing inverse quantization performed by the conventional picture coding apparatus or picture decoding apparatus as presented in the MPEG-2 and the MPEG-4.
As shown in the diagram, the conventional picture coding apparatus or picture decoding apparatus obtains a weighting matrix Wi,j and a quantization parameter QP (S11 and S12), calculates a quantization step QStep, and obtains a quantized value (i.e., a quantized frequency coefficient) fi,j (S14). Then, the picture coding apparatus derives an inverse quantized value by calculating fi,j×QStep×Wi,j (S15-S17).
In the processing of quantization performed by the picture coding apparatus, the frequency coefficients obtained as a result of orthogonal transformation is multiplied by an inverse number of the value resulted from the calculation of QStep×Wi,j.
However, it is problematic that the conventional processing of quantization and inverse quantization imposes a lot of loads for calculations since a number of divisions and multiplications are required to be executed in the processing.