Recently, with an arrival of the age of multimedia which handles integrally audio, picture, other contents or the like, it is now possible to obtain or transmit the information conveyed by existing information media, i.e., newspapers, journals, TVs, radios and telephones and other means using a single terminal. Generally speaking, multimedia refers to something that is represented in association not only with characters but also with graphics, audio and especially pictures and the like together. However, in order to include the aforementioned existing information media in the scope of multimedia, it appears as a prerequisite to represent such information in digital form.
However, when estimating the amount of information contained in each of the aforementioned information media as the amount of digital information, the information amount per character requires 1˜2 bytes whereas the audio requires more than 64 Kbits (telephone quality) per second and when it comes to the moving picture, it requires more than 100 Mbits (present television reception quality) per second. Therefore, it is not realistic to handle the vast information directly in digital form via the information media mentioned above. For example, a videophone has already been put into practical use via Integrated Services Digital Network (ISDN) with a transmission rate of 64 Kbps ˜1.5 Mbps, however, it is not practical to transmit the moving picture captured on the TV screen or shot by a TV camera.
This therefore requires information compression techniques, and for instance, moving picture compression techniques compliant with H.261 and H.263 standards internationally standardized by ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) are used in the case of the videophone. According to information compression techniques compliant with the MPEG-1 standard, picture information as well as music information can be stored in an ordinary music CD (Compact Disc).
The MPEG (Moving Picture Experts Group) is an international standard for compression of moving picture signals and MPEG-1 is a standard that compresses moving picture signals down to 1.5 Mbps, that is, to compress information of TV signals approximately down to a hundredth. The transmission rate within the scope of the MPEG-1 standard is limited primarily to about 1.5 Mbps, therefore, MPEG-2, which was standardized with the view to meet the requirements of high-quality pictures, allows a data transmission of moving picture signals at a rate of 2˜15 Mbps. In the present circumstances, a working group (ISO/IEC JTC1/SC29/WG11) in the charge of the standardization of the MPEG-1 and the MPEG-2 has standardized MPEG-4 that achieves a compression rate which goes beyond the one achieved by the MPEG-1 and the MPEG-2, realizes coding/decoding operations on a per-object basis as well as a new function required by the age of multimedia (see reference, for instance, to the specifications of the MPEG-1, MPEG-2 and MPEG-4 produced by the ISO). The MPEG-4 not only realizes a highly efficient coding method for a low bit rate but also introduces powerful error resistance techniques that can minimize a degrading of a screen quality even when an error is found in a transmission line. Also, the ISO/IEC and ITU work together on a standardization of MPEG-4 AVC/ITU H.264 as a next generation picture coding method.
Coding of moving pictures, in general, compresses information volume by reducing redundancy in both temporal and spatial directions. Therefore, inter-picture prediction coding, which aims at reducing the temporal redundancy, estimates a motion and generates a predictive picture on a block-by-block basis with reference to previous and subsequent pictures vis-á-vis a current picture to be coded, and then codes a differential value between the obtained predictive picture and the current picture. Here, the term “picture” represents a single screen whereas it represents a frame when used in a context of progressive picture as well as a frame or a field in a context of an interlaced picture. The interlaced picture here is a picture in which a single frame consists of two fields having different time. In the process of coding and decoding the interlaced picture, three ways are possible: handling a single frame either as a frame, as two fields or as a frame structure or a field structure depending on a block in the frame.
FIG. 1 is a diagram showing an example of types of pictures and how the pictures refer to each other. The hatched pictures in FIG. 1 are pictures to be stored in a memory since they are referred to by other pictures. As for the arrows used in FIG. 1, the head of the arrow points at a reference picture departing from a picture that refers to the reference picture. Here, the pictures are in display order.
I0 (Picture 0) is an intra-coded picture (I-picture) which is coded independently from other pictures (namely without referring to other pictures). P4 (Picture 4) and P7 (Picture 7) are forward prediction coded pictures (P-picture) that are predictively coded with reference to I-pictures located temporally previous to the current picture or other P-pictures. B1˜B3 (Pictures 1˜3), B5 (Picture 5) and B6 (Picture 6) are bi-directional prediction coded pictures (B-picture) that are predictively coded with reference to other pictures both temporally previous and subsequent to the current picture.
FIG. 2 is a diagram showing another example of the types of pictures and how the pictures refer to each other. The difference between FIG. 2 and FIG. 1 is that a temporal position of the pictures referred to by a B-picture is not limited to the pictures that are located temporally previous and subsequent to the B-picture. For example, the B5 can refer to two arbitrary pictures out of I0 (Picture 0), P3 (Picture 3) and P6 (Picture 6). Namely, the I0 and the P3, located temporally previously can be used as reference pictures. Such a reference method is already acknowledged in the specification of the MPEG-4 AVC/H.264 as of September 2001. Thus, a range for selecting an optimal predictive picture is widened and thereby the compression rate can be improved.
FIG. 3 is a diagram showing an example of a stream structure of picture data. As shown in FIG. 3, the stream includes a common information area such as a header or the like and a GOP (Group Of Picture) area. The GOP area includes a common information area such as a header or the like and a plurality of picture areas. The picture area includes a common information area such as a header or the like and a plurality of slice data areas. The slice data area includes a common information area such as a header and a plurality of macroblock data areas.
In the picture common information area, the weighting factor necessary for performing weighted prediction to be mentioned later are described respectively according to the reference picture.
When transmitting data not in a bit stream having successive streams but in a packet that is a unit consisting of pieces of data, the header part and the data part which excludes the header part can be transmitted separately. In this case, the header part and the data part can not be included in a single bit stream. In the case of using a packet, however, even when the header part and the data part are not transmitted in sequence, the data part and the header part are transmitted respectively in a different packet. Although they are not transmitted in a bit stream, the concept is the same as in the case of using a bit stream as described in FIG. 3.
The following describes weighted prediction processing carried out by the conventional picture coding method.
FIGS. 4A and 4B are pattern diagrams showing cases of performing weighted prediction on a frame-by-frame basis.
When referring to a single frame, as shown in FIG. 4A, a pixel value Q in a predictive picture with respect to a current block to be coded can be calculated using an equation for weighted prediction as shown in equation (1) below, where a pixel value within a reference block in the i th number of reference frame, Frame i, is represented as P0. When referring to two frames, as shown in FIG. 4B, the pixel value Q in the predictive picture can be calculated using an equation for weighted prediction as shown in equation (2) below, where respective pixel values within the reference blocks in the i th and j th numbers of reference frames, Frame i and Frame j, are represented as P0 and P1.Q=(P0×W0+D)/W2  (1)Q=(P0×W0+P1×W1+D)/W2  (2)
Here, W0 and W1 represent weighting factors whereas W2 represents a normalization factor and D represents a biased component (DC component).
FIGS. 5A and 5B are pattern diagrams showing cases of performing weighted prediction processing on a field-by-field basis.
When referring to a single frame (namely, two fields) as shown in FIG. 5A, pixel values Qa and Qb in the predictive pictures with respect to a current block can be calculated using equations for weighted prediction as shown in equations (3) and (4) below, where pixel values within respective reference blocks in respective fields of 2xi+1 and 2xi, composing the i th number of frame (Frame i) which is for reference, are represented as P0a and P0b. When referring to two frames, as shown in FIG. 5B, the pixel values Qa and Qb can be calculated by using equations for weighted prediction as shown in equations (5) and (6) below, where pixel values within the respective reference blocks in field 2xi+1, 2xi, 2xj+1 and 2xj, composing the i th and j th number of frames (Frame and Frame j) are represented respectively as P0a, P0b, P1a and P1b. Qa=(P0a×W0a+Da)/W2a  (3)Qb=(P0b×W0b+Db)/W2b  (4)Qa=(P0a×W0a+P1a×W1a+Da)/W2a  (5)Qb=(P0b×W0b+P1b×W1b+Db)/W2b  (6)
Here, W0a, W0b, W1a and W1b represent weighting factors whereas W2 represents a normalization factor and Da and Db represent biased components.
FIG. 6 is a block diagram showing a functional structure of a conventional picture coding apparatus 100. The picture coding apparatus 100 performs compression coding (for example, variable length coding) for an inputted image signal Vin and outputs a coded image signal Str that is a bit stream converted by the compression coding, and includes a motion estimation unit ME, a motion compensation unit MC, a substraction unit Sub, an orthogonal transformation unit T, a quantization unit Q, an inverse quantization unit IQ, an inverse orthogonal transformation unit IT, an addition unit Add, a picture memory PicMem, a switch SW and a variable length coding unit VLC.
The image signal Vin is inputted to the substraction unit Sub and the motion estimation unit ME. The substraction unit Sub calculates a differential value between the inputted image signal Vin and the predictive image and outputs the result to the orthogonal transformation unit T. The orthogonal transformation unit T transforms the differential value to a frequency coefficient and then outputs it to the quantization unit Q. The quantization unit Q quantizes the inputted frequency coefficient and outputs a quantized value to the variable length coding unit VLC.
The inverse quantization unit IQ reconstructs the quantized value as a frequency coefficient by inverse-quantizing it and outputs it to the inverse orthogonal transformation unit IT. The inverse orthogonal transformation unit IT performs inverse frequency conversion to the frequency coefficient in order to obtain a pixel differential value and outputs it to the addition unit Add. The addition unit Add adds the pixel differential value to the predictive image outputted from the motion compensation unit MC and obtains a decoded image. The switch SW is ON when it is instructed to store the decoded image, and the decoded image is stored in the picture memory PicMem.
The motion estimation unit ME, to which the image signal Vin is inputted on a macroblock-by-macroblock basis, targets the decoded pictures stored in the picture memory PicMem for search, and by estimating an image area according to the image signal that is the closest to the inputted image signal, determines a motion vector MV that indicates the area. The estimation of the motion vector is operated using a block that is a unit made by further dividing a macroblock. Since multiple pictures can be used as reference pictures, identification numbers (picture number index) for identifying the pictures used for reference are required for each block. It is thus possible to identify the reference pictures by corresponding the picture numbers assigned to each of the pictures in the picture memory PicMem to the reference pictures with the use of the picture number Index.
The motion compensation unit MC takes out an image area necessary for generating a predictive image from a decoded picture stored in the picture memory PicMem using the picture number Index. The motion compensation unit MC then determines a final predictive image obtained by performing, to the pixel values in the obtained image area, pixel value conversion processing such as interpolating processing operated in the weighted prediction using the weighting factors associated with the picture number Index.
FIG. 7 is a block diagram showing a sketch of a functional structure of the variable length coding unit VLC in the conventional picture coding apparatus 100 shown in FIG. 6. The variable length coding unit VLC includes an MV coding unit 101, a quantized value coding unit 102, a weighting factor coding unit 103, an index coding unit 104, an AFF (Adaptive Field Frame) identifying information coding unit 105 and a multiplexing unit 106.
The MV coding unit 101 codes a motion vector whereas the quantized value coding unit 102 codes a quantized value Qcoef. The weighting factor coding unit 103 codes a weighting factor Weight whereas the index coding unit 104 codes a picture number Index. The AFF identifying information coding unit 105 codes an AFF identification signal AFF (the AFF identification signal AFF will be mentioned later on). The multiplexing unit 106 multiplexes each of the coded signals outputted from the MV coding unit 101, the quantized value coding unit 102, the weighting factor coding unit 103, the index coding unit 104 and the AFF identifying information coding unit 105 and then outputs a coded image signal Str.
FIG. 8 is a block diagram showing a functional structure of a conventional picture decoding apparatus 200.
The picture decoding apparatus 200 for decoding the coded image signal Str coded by the picture coding apparatus 100 described above includes a variable length decoding unit VLD, a motion compensation unit MC, an addition unit Add, a picture memory PicMem, an inverse quantization unit IQ and an inverse orthogonal transformation unit IT.
When the coded image signal Str is inputted, the variable length decoding unit VLD demultiplexes the inputted coded image signal Str into a motion differential vector MV that is coded, an index indicating a picture number and a weighting factor Weight and outputs them to the motion compensation unit MC. The variable length decoding unit VLD then decodes the coded quantized value Qcoef included in the inputted coded image signal Str and outputs it to the inverse quantization unit IQ.
The motion compensation unit MC takes out an image area necessary for generating a predictive image from a decoded picture stored in the picture memory PicMem using the motion vector and the picture number Index which are outputted from the variable length decoding unit VLD. The motion compensation unit MC then generates a predictive image by performing pixel value conversion processing such as interpolating processing in the weighted prediction using the weighting factor Weight for the obtained image.
The inverse quantization unit IQ inverse-quantizes the quantized value and reconstructs it as a frequency coefficient and outputs it to the inverse orthogonal transformation unit IT. The inverse orthogonal transformation unit IT performs inverse frequency conversion to the frequency coefficient in order to obtain a pixel differential value and outputs it to the addition unit Add. The addition unit Add adds the pixel differential value to the predictive image outputted from the motion compensation unit MC and obtains a decoded image. The decoded picture is stored in the picture memory PicMem to be used for reference in the inter-picture prediction. The decoded picture is outputted as a decoded picture signal Vout.
FIG. 9 is a block diagram showing a sketch of a functional structure of a variable length decoding unit VLD in the conventional picture decoding apparatus 200 shown in FIG. 8.
The variable length decoding unit VLD includes a demultiplexing unit 201, an MV decoding unit 202, a quantized value decoding unit 203, a weighting factor decoding unit 204, an index decoding unit 205 and an AFF identification signal decoding unit 206.
When the coded image signal Str is inputted to the variable length decoding unit VLD, the demultiplexing unit 201 demultiplexes the inputted coded image signal Str and outputs respectively as follows: the coded motion differential vector MV to the MV decoding unit 202; the coded quantized value Qcoef to the quantized value decoding unit 203; the coded weighting factor Weight to the weighting factor decoding unit 204; the coded picture number to the index decoding unit 205 and the coded AFF identification signal AFF (abbreviated as “AFF” in the following description) to the AFF identification signal decoding unit 206.
The MV decoding unit 202 decodes the coded differential vector and outputs a motion vector MV.
Similarly, the quantized value decoding unit 203 decodes the quantized value, the weighting factor decoding unit 204 decodes the weighting factor Weight, the index decoding unit 205 decodes the picture number Index and the AFF identification signal decoding unit 206 decodes the AFF respectively and then outputs them.
The conventional coding using weighted prediction, however, is performed on a picture-by-picture basis with an assumption that a block is coded/decoded for the same picture (a frame or one of the two fields). Therefore, only a set of weighting factors can be coded/decoded in the picture.
Therefore, in spite that the conventional picture coding apparatus has the potential to improve efficiency in motion estimation, only a single weighting factor can be transmitted on a block-by-block basis and thereby prediction efficiency is low even when the switching of field/frame takes place on a block-by-block basis, and thereby the compression rate can not be improved.