1. Field of the Invention
The present invention relates to an adaptive encoding apparatus for high-efficiency encoding of a video signal.
2. Description of the Prior Art
As a result of advances which have been made in digital signal technology, corresponding advances have been achieved in achieving highly efficient encoding of video signals such as television signals, for transmitting a video signal over a digital communication line, or storage of the video signal data on a digital recording medium such as a CD ROM or CD-I. Recently, international standards have been proposed for the encoding of video signals representing static pictures, and for adaptive encoding of video signals representing moving pictures, for data transmission or recording purposes. Most of these proposals for high-efficiency encoding of video signals are based on the use of the discrete cosine transform, using an algorithm whereby difference values between corresponding picture element values of successive frames of a video signal are converted to respective transform coefficients, which are quantized and encoded. An example of a video signal encoding apparatus which utilizes the discrete cosine transform is described in the CCITT Recommendation H.261 entitled "Codec for Audiovisual Services at n.times.64 kbit/s" (Melbourne, 1988). To provide a clear understanding of the prior art problem that is to be overcome by the present invention, the basic features of such a prior art encoding apparatus will be described in detail in the following, referring to the drawings.
FIG. 1 shows the general configuration of that prior art video encoding apparatus. Although the apparatus is intended for encoding both audio and video information, only the video information aspect will be described. In FIG. 1, numeral 111 denotes a video signal input terminal, 100 denotes a block conversion section for local conversion of each frame of the input video signal into a set of blocks of picture element values. Numeral 101 denotes a prediction function judgement section, for judging whether a prediction function that is applied to a block that is currently being encoded shall be obtained by intra-frame prediction or be obtained on the basis of motion-compensated inter-frame prediction. Numeral 114 denotes a subtractor, for subtracting a prediction signal from the input video signal to obtain a prediction error signal, and 102 denotes an orthogonal transform section which executes 2-dimensional orthogonal transform processing of the input video signal or of the prediction error signal, in units of blocks. Numeral 103 denotes a quantizer section which quantizes the successive transform coefficients that are derived by the orthogonal transform section 102, with a quantization step size that is determined by an output signal from a quantization step size calculation section 104, to thereby obtain respective quantization index values. Numeral 104 denotes a quantization step size calculation section, for calculating respective values of quantization step size for each block, in accordance with the amount of encoded data that currently remain in a buffer memory section 110 (i.e. data that are still to be transmitted by the apparatus). Numeral 105 denotes a dequantizer section which derives respective quantized values from the quantization index values, by executing the inverse procedure to that of the quantizer section 103. An inverse orthogonal transform section 106 receives these quantized values, and executes inverse transform processing to that executed by the orthogonal transform section 102, to thereby reproduce the input signal supplied to the orthogonal transform section 102, i.e. to obtain a reproduced video signal or a reproduced prediction error signal. That reproduced output signal from the inverse orthogonal transform section 106 is added to the input video signal or the prediction error signal in an adder 124, to thereby obtain a reproduced video signal. A motion compensated prediction section 107 executes motion compensation of the reproduced video signal of the preceding frame, to obtain a motion compensated prediction signal. A loop filter 108 executes 2-dimensional low-pass filter processing of the motion compensated prediction signal. An encoding section 109 encodes the quantization index values for the respective transform coefficients, and also encodes the motion vector, prediction function judgment result, quantization step size, and loop filter in-circuit/out-of-circuit indication signals, and multiplexes the resultant code data into a specific format, referred to as the transmission frame. A buffer memory section 110 serves to temporarily hold the output data from the encoding section 109, before the data are transferred to an output terminal 139.
The operation of the above prior art apparatus is as follows. An input video signal 112 supplied to the input terminal 111 consists of successive picture element luminance and chrominance values, and represents a picture having a ratio of the number of picture elements along the horizontal direction to the number of picture elements along the vertical direction that is equal to 2:1. In the block conversion section 100 the input video signal is locally divided into units of blocks, to obtain an output video signal 113. The basic block unit is referred to as a macroblock, which consists of four (8.times.8) blocks of picture element luminance values corresponding to a picture array of (16.times.16) picture elements, and a pair of (8.times.8) blocks of picture element chrominance values (i.e. a U block and a V block). Each of these blocks of chrominance (8.times.8) values corresponds to the aforementioned picture area array of (16.times.16) picture elements. A sequence of six of these (8.times.8) blocks is thus generated for each macroblock, from the block conversion section 100, as the video signal 113.
The motion compensated prediction section 107 compares the video signal 113 with the reproduced video signal of the preceding frame (which has been stored in the motion compensated prediction section 107) in units of macroblocks, to obtain the position within the macroblock for which the least amount of inter-block error occurs, and outputs that amount of displacement as the motion vector, to thereby produce the motion vector signal 132. If, when the inter-block error is compared with that for the case of zero displacement, it is found that motion compensation will not result in a reduction of the error, then it is possible to forcibly set the value of motion vector to zero. The motion compensated prediction section 107 also derives (for each macroblock) the reproduced video signal of the preceding frame, shifted in position by an amount equal to the motion vector that has been obtained for that macroblock, and outputs the result. The output values thus obtained constitute the motion compensated prediction signal 126.
The motion compensated prediction signal 126 is inputted to the loop filter section 108, to be either subjected to 2-dimensional low-pass filtering, to thereby reduce the block distortion which arises between adjacent blocks, or to be transferred through the loop filter section 108 without being filtered. The result is outputted as the inter-frame prediction signal 127. The decision as to whether or not the 2-dimensional low-pass filtering is to be applied is based upon whether the size of the motion vector is zero or other than zero. A filter processing signal 133 is outputted from the motion loop filter 108 to provide an indication as to whether or not the filter processing is actually applied.
The subtractor 114 subtracts the inter-frame prediction signal 127 from the input video signal 113, to obtain as output the inter-frame prediction error signal 116. The prediction function judgement section 101 executes, for each block of the input video signal 113, intra-frame prediction using the average value of the block as a prediction value, and inter-frame prediction using the inter-frame prediction signal 127 as a prediction value, and compares the respective values of prediction error that result from these two prediction functions, to determine which prediction function (i.e. that which results in the smallest amount of error) is to be selected for use with that block. A prediction function selection signal 134 is outputted from the prediction function judgement section 101 in accordance with the result of that determination.
The switch 118 is controlled by the prediction function selection signal 134 such that, when intra-frame prediction is to be selected, the input terminal 115 of switch 118 is set so that the input video signal 113 is selected to be supplied to the orthogonal transform section 102 as signal 119. Conversely when the prediction function selection signal 134 indicates that inter-frame prediction is to be selected, the input terminal 117 of switch 118 is set, so that the inter-frame prediction error signal 116 is selected to be the orthogonal transform section input signal 119. The switch 130 is similarly controlled in accordance with the state of the prediction function selection signal 134, such that when intra-frame prediction is to be selected, the input terminal 129 of switch 130 is set so that a value of zero is outputted as the prediction signal 131. Conversely when the prediction function selection signal 134 indicates that inter-frame prediction is to be selected, the input terminal 128 of switch 130 is set, so that the inter-frame prediction signal 127 is selected to be the prediction signal 131.
In the orthogonal transform section 102, the orthogonal transform section input signal 119 is subjected to 2-dimensional orthogonal transform processing to obtain resultant transform coefficients 120. A specific example of an orthogonal transform method is the Discrete Cosine Transform, which enables highly efficient encoding to be achieved using practical hardware. The 2-dimensional Discrete Cosine Transform is expressed by equation (1) below. In the prior art encoding apparatus example being described here, N has a value of 8. ##EQU1##
In equation (1), j and k are spatial coordinates in the picture element space, u, v are coordinates in the transform space, f(j, k) is the input video signal or the inter-frame prediction error signal, F(u, v) is the transfer coefficient, and C(w) is a value that is given by equation (2) below. ##EQU2##
In the quantization step size calculation section 104, the value of the quantization step size is calculated by using equation (3) below, in accordance with the residual code quantity 137 within the buffer memory section 110. EQU Qstep=2.multidot.INT[Buf/(200.multidot.q)]+2 (3)
In equation (3), Qstep denotes the quantization step size, INT[ ]denotes a function which derives an integer value of the quantity within [ ], Buf denotes the residual code quantity within the buffer memory section 110, q denotes an encoding speed parameter which is related to the encoding speed V by equation (4) below. EQU V=q.times.64kbit/sec (4)
In the quantizer section 103, each orthogonal transform coefficient 120 is quantized using a quantization step size that is calculated by the quantization step size calculation section 104, and the resultant quantization index value 121 is then derived. The quantization step size is selectively varied as described hereinafter, however a fixed step size is used within each block. In the dequantizer section 105, the quantization value 122 corresponding to a quantization coefficient is calculated, based on the quantization index value 121 and quantization step size 135.
In the inverse orthogonal transform section 106, a transform operation is executed which is the inverse of the transform operation executed by the orthogonal transform section 102, to thereby obtain the inverse orthogonal transform output 123. In the case of the 2-dimensional Discrete Cosine Transform being used, the inverse transform operation executed by the inverse orthogonal transform section 106 is expressed by the following equation (5) for the 2-dimensional inverse Discrete Cosine Transform: ##EQU3##
The adder 124 adds together the inverse orthogonal transform output 123 and the prediction signal 131, to obtain the reproduced video signal 125, which is then temporarily stored in the memory within the motion compensated prediction section 107. In the encoding section 109, the quantization index values produced from the quantizer section 103 are successively processed in units of blocks. The values within a block are successively examined from beginning to end in a particular sequence, with that operation being referred to as zig-zag scanning, to thereby efficiently detect respective sequences of zero values within the block. Each such "run" of zeros, in combination with the non-zero value which ends that "run" is then converted into a variable-length code value or a combination of a fixed-length code and variable-length code value, by 2-dimensional encoding using a reference code table.
The motion vector signal 132 is encoded as follows. For each macroblock, the respective differences between the horizontal and vertical components of the motion vector of that macroblock and those of the motion vector of the immediately preceding macroblock are calculated, and these values are then converted to variable-length code by using a reference code table.
The quantization step size 135, filter processing discrimination signal 133, the prediction function selection signal 134, and any other data which will be necessary at the time of decoding, are encoded in a similar manner to that described above.
The various code data thus obtained can be combined in transmission frame format in accordance with CCITT Recommendation H.221 (entitled "Frame Structure for a 64 kbit/s Channel in Audiovisual Teleservices", Melbourne, 1988) , as the bit stream 136. The transmission frames, produced at irregular timings, are temporarily held in the buffer memory section 110 before being outputted to the transmission path via the output terminal 139 in order to output the data at a fixed bit rate, as the bit stream 138. The number of bits which currently are being held in the buffer memory section 110 is outputted therefrom as the residual code quantity 137
A prior art apparatus of the type described hereinabove has the following disadvantage. The quantization step size is determined only by the current value of the residual code quantity 137 in the output buffer, i.e. the step size is varied, from block to block, such as to ensure that a constant bit rate will be obtained for the encoded data stream. In addition, in order to make the bit rate of the encoded data as low as possible, the average value of the step size must be made as large as possible, consistent with acceptable picture quality being obtained from the resultant decoded video signal. However it is found that a specific type of noise occurs in the displayed picture that is obtained from such a system, when the step size of a specific region is made large. Within each region of the picture which is visually smooth in texture (i.e. each region within which the luminance and chrominance values of the video signal are relatively constant) and which adjoins a region within which abrupt variations in the picture occur, or which adjoins a region which differs greatly from it in luminance and/or chrominance, a specific type of noise (known as "mosquito noise" and appearing as spurious patterns in the displayed picture) is produced at the boundary between the regions. The noise pattern is made clearly visible within the region of substantially smooth texture, and so is highly conspicuous. Such a problem occurs due to the fact that the Discrete Cosine Transform operation, being a transform into the frequency domain, produces transform coefficients which are low in amplitude for the case of high-frequency components of the signal that is operated on, so that these high-frequency components may be eliminated if the quantization step size is large. That results in overshoot or undershoot effects occurring in those parts of the resultant decoded video signal which correspond to the aforementioned boundaries.
The problem is basically due to the method of setting the quantization step size in such a prior art encoding apparatus, whereby the step size is only controlled such as to stabilize the bit rate of the outputted code data, with no consideration given to the visual effects which may occur in the aforementioned blocks which are situated on boundaries between different regions.