1. Field of the Invention
The present invention relates to video coding apparatus and video decoding apparatus, and more particularly, to a video coding apparatus that performs predictive coding of digital video signals and a video decoding apparatus that reproduces the original motion images from the predictive-coded video signal produced by the video coding apparatus.
2. Description of the Related Art
The ITU-T standard H.261 and the ISO standards MPEG-1 and MPEG-2, for example, are well-acknowledged international standards for motion picture coding techniques. Those standards use hybrid coding algorithms, where the coding process will proceed as follows: (1) a source picture is divided into blocks of pixels, (2) orthogonal transformation (e.g., discrete cosine transform) and motion compensation are applied independently on each block, and (3) quantized video data is compressed by entropy coding.
When a motion of considerable magnitude or a full scene transition happened in the middle of a sequence of video frames, the above-described hybrid video coding techniques may suffer from an overwhelming amount of coded frame data that exceeds a certain standard level allowed for each frame. In this case, the coder will forcibly reduce the amount of coded data in an attempt to regulate it at the standard level. This will cause extreme degradation in image quality and coarse frame subsampling (or a drop in frame update rates), thus resulting in unacceptably poor pictures when reconstructed at the receiving ends.
A video coding system aiming at avoidance of the above problem is proposed in Japanese Patent Application No. 8-75605 (1996), for instance, by the same applicant of the present invention. In this proposed system, the video coding apparatus reduces the resolution of input frame signals to regulate the amount of coded frame data when a full scene transition or a massive motion has happened in the middle of a sequence of video frames.
FIG. 14 is a block diagram of this video coding apparatus proposed in the Japanese Patent Application No. 8-75605. The apparatus of FIG. 14 supports two kinds of picture resolutions: Common Intermediate Format (CIF, 352.times.288 pixels) and quarter-CIF (QCIF, 176.times.144 pixels). A CIF/QCIF selection controller 125 determines which picture resolution should be used to encode source pictures, considering the amount of coded frame data produced in a predictive coding, quantizer step size, and some other parameters. For example, the CIF/QCIF selection controller 125 normally chooses the high resolution CIF, while it chooses the low resolution QCIF when a large amount of data has been produced as a result of the coding.
A frame memory 122 is used to store reconstructed (or decoded) pictures of the previous frames. Comparing the source picture of the current frame with a decoded picture that is retrieved from the frame memory 122 as the reference picture, a prediction parameter calculation unit 112 computes motion vectors of the current frame. Here, a picture is partitioned into a plurality of blocks and the comparison of frame data is performed on a block-by-block basis. Each source frame picture is subjected to either an intraframe coding or an interframe coding. A prediction parameter calculation unit 112 determines which coding scheme should be applied to the source frame picture. When the interframe coding is activated, a prediction picture generation unit 113 produces a prediction picture of the current frame based on the decoded image of the previous frame and the motion vectors calculated by the prediction parameter calculation unit 112.
A prediction error signal generation unit 114 produces a prediction error signal by calculating differences between the source picture and the prediction picture on a block-by-block basis. A CIF/QCIF converter 131 changes the resolution of this prediction error signal, which is originally CIF, to what is chosen by the CIF/QCIF selection controller 125. More specifically, the CIF/QCIF converter 131 outputs the prediction error signal as it is when the CIF resolution is selected by the CIF/QCIF selection controller 125, and it in turn converts the resolution to QCIF when the QCIF resolution is selected.
A coding controller 124 receives information regarding the amount of the resultant coded data from an entropy coding unit 117 (described later), as well as obtaining information on buffer occupancy from a coded data buffer 118 (described later). Based on such information, the coding controller 124 determines the quantizer step size and distributes it to a quantizer 116, a dequantizer 119, the CIF/QCIF selection controller 125, and the entropy coder 117.
A DCT processor 115 applies an orthogonal transform, or a digital cosine transform (DCT), to the output of the CIF/QCIF converter 131, and a quantizer 116 quantizes the obtained DCT coefficients in accordance with the quantizer step size specified by the coding controller 124.
The entropy coder 117 receives the quantized DCT coefficients from the quantizer 116, the picture resolution from the CIF/QCIF selection controller 125, and the motion vectors and coding scheme information from the prediction parameter calculation unit 112. Entropy coding is a data compression process that assigns shorter code words to frequent events and longer code words to less frequent events. Out of a predefined code word table, the entropy coder 117 retrieves code words relevant to each combination of the above received data, thereby outputting the coded frame data.
The quantized DCT coefficients produced by the quantizer 116 are also supplied to the dequantizer 119 for inverse quantization, or dequantization. The resultant output signals are then subjected to an inverse discrete cosine transform (IDCT) process that is executed by an IDCT processor 120 to reproduce the original prediction error signal. When the reproduced prediction error signal has the QCIF format as a result of the resolution reduction by the CIF/QCIF converter 131, a QCIF/CIF converter 132 reconverts it to regain the original CIF resolution. A decoded picture generator 121 reconstructs a picture by adding the prediction error signal outputted by the QCIF/CIF converter 132 to the prediction picture produced by the prediction picture generator 113. This fully decoded picture is then transferred to a frame memory 122 for storage.
As described above, the proposed video coding apparatus monitors the amount of coded frame data and the like, and if any significant increase is expected in the amount of coded frame data, the apparatus will reduce the resolution of the prediction error signal from CIF to QCIF.
The CIF/QCIF converter 131 performs such resolution reduction through a downsampling process as exemplified in FIG. 15. More specifically, white dots in FIG. 15 represent CIF pixels and lower-case alphabetic characters placed in them indicate their respective prediction error signal values. Black dots represent QCIF pixels, and upper-case letters beside them signify their respective prediction error signal values. The downsampling process calculates the QCIF prediction error signal values A, B, C, and D by averaging four values of the CIF pixels surrounding each of the QCIF pixels. For example, the pixel value A is obtained as EQU A=(a+b+e+f)/4. (1)
In contrast to that, the QCIF/CIF converter 132 performs a QCIF-to-CIF resolution conversion through an upsampling process as shown in FIG. 16. More specifically, black dots represent QCIF pixels, and upper-case letters beside them indicate their respective prediction error signal values, while white dots represent CIF pixels and lower-case letters in them indicate their respective prediction error signal values. To obtain the CIF prediction error signal values a, b, c, and so on, the upsampling process calculates a weighted average value of four QCIF pixels surrounding each CIF pixel. For example, the pixel value f is obtained as EQU f=(9A+3B+3C+D)/16, (2)
where four QCIF values are summed up with weighting coefficients determined in accordance with their respective distances from the pixel 134 of interest.
It should be noted here that the above-described conventional video coding apparatus is constructed on the assumption that all blocks in a frame are encoded by using a consistent coding scheme. More specifically, it is assumed that every block in a given frame is subjected to either an intraframe coding or an interframe coding, but this coding scheme cannot be switched in the middle of the frame.
In reality, however, the two different coding schemes can sometimes be applied to different blocks in the same frame. If this is the case, some adjacent blocks within the same frame will be coded in different ways. Take the pixel map illustrated in FIG. 16 for example. Here, a dashed line 133 represents a block boundary where the applied coding scheme changes from interframe coding to intraframe coding or vise versa. To calculate a prediction error signal value f at a CIF pixel 134, the QCIF/CIF converter 132 uses the equation EQU f=(9A+3B+3C+D)/16, (3)
where the QCIF pixel values C and D subject to the different coding scheme will affect the result f. Referring now to FIG. 17, problems caused by this mixed pixel reference will be discussed below.
FIG. 17 schematically shows a process of the predictive coding and decoding of a source picture. FIG. 17 consists of six graphs, (a) to (f), each of which represents how the pixel values will vary when scanning across some different blocks. In other words, these graphs show the profiles of pixel values in the neighborhood of a certain block boundary. More specifically, the left half of each profile (labeled "Intra") is a block subject to the intraframe coding, while the right half (labeled "Inter") is a block subject to the interframe coding, where the vertical dashed line indicates the block boundary. The upper-left graph (a) shows the profile of a source picture, in which the pixel values are just flat in both blocks. Since the left block is subjected to the intraframe coding and thus has no reference frame for prediction, its pixel values in the prediction picture profile (b) will be zeros. Accordingly, the resultant prediction error signal (c) exhibits large difference values in the left block, while showing small values in the right block that is subject to the interframe coding. Incidentally, in FIG. 17 (and also in later figures), the big "+" and "-" signs imply subtraction and addition of pictures, respectively.
In such a situation where two adjacent blocks are coded with different schemes (i.e., intraframe and interframe), the upsampling operations executed by the QCIF/CIF converter 132 as noted earlier will introduce a mixture of differently coded pixel values in the vicinity of the block boundary. That is, the reproduced prediction error signal will be deformed as illustrated in a profile (d) of FIG. 17, as a result of the upsampling operations by the QCIF/CIF converter 132. Then the summation of this reproduced prediction error signal (d) and a prediction picture (e), which equals the prediction picture (b), provided from the prediction picture generation unit 113 will yield a decoded picture (f). As illustrated in FIG. 17, the resultant decoded picture (f) contains some distortion in the vicinity of the block boundary. Fidelity of decoded pictures to the original pictures is one of the important design considerations in video decoders. As opposed to this, the picture (f) reconstructed by the conventional video coding apparatus is different from the original source picture (a).
This kind of problem may occur not only in the particular situation in which two different coding schemes are applied to adjacent blocks in a frame, but it can potentially happen to any video frames that contain some sharp edges exhibiting a large difference or discontinuity in pixel values at a certain block boundary in a single frame. Such discontinuous transitions of pixel values may also be observed in such video frames where two neighboring blocks have quite different motion vectors. When decoded, the picture will suffer from similar noises, or artifacts, produced in the vicinity of the boundary of those neighboring blocks.