1. Field of the Invention
The present invention relates to an image signal encoding method and an image signal encoding apparatus, an image signal decoding method and an image signal decoding apparatus, an image signal transmission method and an image signal transmission apparatus, and a recording medium on which information is recorded in the form which can be decoded by the image signal decoding apparatus, which are suitable for use in systems for recording an input image signal on a recording medium such as an magneto-optical disk or a magnetic tape and reproducing the image signal from the recording medium thereby displaying the reproduced image on a display device, or systems, such as a video conference system, a video telephone system, broadcasting equipment, a multimedia database retrieving system, for transmitting a moving image signal via a transmission line from a transmitting end to a receiving end so that the transmitted moving image is displayed on a displaying device at the receiving end, and also systems for editing and recording a moving image signal.
2. Description of the Related Art
In the art of moving-image transmission systems such as video conference systems or video telephone systems, it is known in to convert an image signal into a compressed code on the basis of line-to-line and/or frame-to-frame correlation of the image signal so as to use a transmission line in a highly efficient fashion. The encoding technique according to the MPEG (Moving Picture Experts Group) standard can provide a high compression efficiency and is widely used. This standard has been proposed after the discussion in the ISO (International Standardization Organization).
The MPEG technique is a hybrid technique of motion prediction encoding and DCT (discrete cosine transform) encoding techniques. One specific technique based on the MPEG standard has been developed by the inventors of the present invention and is disclosed in U.S. Pat. No. 5,155,593 (Date of Patent: Oct. 13, 1992).
In the MPEG standard, several profiles at various levels are defined so that the standard can be applied to a wide variety of applications. Of these, the most basic one is the main profile at main level (MP@ML). Referring to FIG. 1, an example of an encoder according to the MP@ML of the MPEG standard will be described below.
An input image signal is supplied to a frame memory 1, and then encoded in the predetermined order as described below. The image data to be encoded is applied, in units of macroblocks, to a motion vector extraction circuit 2. The motion vector extraction circuit 2 processes the image data for each frame as an I-picture, a P-picture, or a B-picture according to a predetermined procedure. In the above procedure, the processing mode is predefined for each frame of the image of the sequence, and each frame is processed as an I-picture, a P-picture, or a B-picture corresponding to the predefined processing mode (for example frames are processes in the order of I, B, P, B, P, . . . , B, P).
The motion vector extraction circuit 2 extracts a motion vector, used in motion compensation process, from each frame of image with reference to a reference frame. The motion compensation (interframe prediction) is performed in one of three modes: forward, backward, and forward-and-backward prediction modes. The prediction for a P-picture is performed only in the forward prediction mode, while the prediction for a B-picture is performed in one of the above-described three modes. The motion vector extraction circuit 2 selects a prediction mode which can lead to a minimum prediction error, and generates a predicted vector in the selected prediction mode. The prediction error is compared for example with the dispersion of the given macroblock to be encoded. If the dispersion of the macroblock is smaller than the prediction error, prediction compensation encoding is not performed on that macroblock but, instead, intraframe encoding is performed. In this case, the prediction mode is referred to as an intraframe encoding mode. The obtained motion vector and the information indicating the prediction mode employed are supplied to a variable-length encoder 6 and a motion compensation circuit 12.
The motion compensation circuit 12 generates a predicted reference image on the basis of the motion vector supplied from the motion vector extraction circuit 2. The result is applied as a predicted reference image signal to an arithmetic operation circuit 3. The arithmetic operation circuit 3 calculates the difference between the value of the given macroblock to be encoded and the value of the predicted reference image. The result is supplied as a predicted residual signal to a DCT circuit 4. In the case of an intramacroblock, the arithmetic operation circuit 3 directly transfers the value of the given macroblock to be encoded to the DCT circuit 4 without performing any operation. That is, in this case, the input image signal is directly supplied to the DCT circuit 4.
The DCT circuit 4 performs a DCT (discrete cosine transform) operation on the predicted residual signal for each given macroblock (or the input image signal) thereby generating DCT coefficients. The resultant DCT coefficients for each given macroblock are supplied to a quantization circuit 5. The quantization circuit 5 quantizes the DCT coefficients in accordance with a quantization scale depending on the amount of data stored in a transmission buffer 7. The quantized data is then supplied to the variable-length encoder 6.
The variable-length encoder 6 converts the quantized data supplied from the quantization circuit 5 into a variable-length code using for example the Huffman encoding technique, in accordance with the quantization scale supplied from the quantization circuit 5. The obtained variable-length code is supplied to a transmission buffer 7. The variable-length encoder 6 also receives the quantization scale from the quantization circuit 5 and the motion vector as well as the information indicating the prediction mode (that is, the information indicating in which mode of the intraframe prediction mode, the forward prediction mode, the backward prediction mode, or forward-and-backward prediction mode the prediction has been performed) from the motion vector extraction circuit 2, and converts these received data into variable-length codes.
The transmission buffer 7 stores the received data temporarily. The information representing the amount of data stored in the transmission buffer 7 is fed back to the quantization circuit 5. If the amount of residual data stored in the transmission buffer 7 reaches an upper allowable limit, the transmission buffer 7 generates a quantization control signal (buffer feedback signal) to the quantization circuit 5 so that the following quantization operation is performed using an increased quantization scale thereby decreasing the amount of quantized data. Conversely, if the amount of residual data decreases to a lower allowable limit, the transmission buffer 7 generates a quantization control signal (buffer feedback signal) to the quantization circuit 5 so that the following quantization operation is performed using a decreased quantization scale thereby increasing the amount of quantized data. In this way, an overflow or underflow in the transmission buffer 7 is prevented. The encoded data stored in the transmission buffer 7 is read out at a specified time and output in the form of an encoded bit stream over a transmission line (not shown) or recorded on a recording medium (not shown).
The quantized data output by the quantization circuit 5 is also supplied to an inverse quantization circuit 8. The inverse quantization circuit 8 performs inverse quantization on the received data in accordance with the quantization scale given by the quantization circuit 5 thereby generating DCT coefficients corresponding to the output signal of the DCT circuit 4. The DCT coefficients generated by the inverse quantization circuit 8 are supplied to an IDCT (inverse DCT) circuit 9 which in turn performs an inverse DCT operation on the received data and generates a predicted residual signal corresponding to the output signal of the arithmetic operation circuit 3 (an input image signal is generated in the case of the intraframe prediction encoding mode). The predicted residual signal is then supplied to an arithmetic operation circuit 10. The arithmetic operation circuit 10 adds together the predicted reference image signal supplied from the motion compensation circuit and the predicted residual signal. The resultant data is stored as a predicted image signal in a frame memory 11.
With reference to FIG. 2, an example of a decoder for performing a decoding operation according to the MP@ML standard of the MPEG will be described below. A coded bit stream is transmitted over the transmission line and is supplied to a receiving circuit (not shown) or is reproduced by a reproducing apparatus. The coded bit stream is stored in a receiving buffer 21 temporarily and then supplied to a variable-length decoder 22. The variable-length decoder 22 performs an inverse variable-length encoding operation on the data supplied from the receiving buffer 21 and generates a motion vector. The obtained motion vector and the information indicating the associated prediction mode are supplied to a motion compensation circuit 27. The variable-length decoder 22 also supplies a quantization scale to the inverse quantization circuit 23. Furthermore, the quantized data obtained by the above inverse variable-length encoding operation, which corresponds to the data output by the quantization circuit 5 in the encoding apparatus, is supplied from the variable-length decoder 22 to the inverse quantization circuit 23.
The inverse quantization circuit 23 performs an inverse quantization operation on the quantized data supplied from the variable-length decoder 22 using the quantization scale supplied from the variable-length decoder 22 thereby generating DCT coefficients corresponding to the output signal of the DCT circuit 4 in the encoding apparatus. The resultant DCT coefficients are supplied to an IDCT circuit 24. The IDCT circuit 24 performs an inverse DCT operation on the received DCT coefficients thereby generating an image signal (in the case of the intraframe prediction encoding mode) corresponding to the output signal of the arithmetic operation circuit 4 of the encoding apparatus. The resultant image signal is supplied to an arithmetic operation circuit 25. The arithmetic operation circuit 25 directly outputs the input image signal as a decoded image signal without performing any operation on it. When the reproduced image signal output by the arithmetic operation circuit 25 is an I-picture data, the reproduced image signal is stored in a frame memory 26 so that a predicted reference signal can be produced later from this image signal, for use in processing an image signal (a P- or B-picture image signal). The above decoded image signal is output to the outside as a reproduced image signal.
In the case where the input encoded bit stream is a P- or B-picture signal, the output signal of the IDCT circuit 24 corresponds to the input image signal of the arithmetic operation circuit 4 (in the case of the intraframe prediction encoding mode). In this case, the image signal output by the IDCT circuit 24 is supplied to the arithmetic operation circuit 25 and the arithmetic operation circuit 25 directly outputs the input image signal as a reproduced image signal without processing it. On the other hand, if the output signal of the IDCT circuit 24 outputs a predicted residual signal corresponding to the predicted residual signal output by the arithmetic operation circuit 4 of the encoding apparatus (this is the case when the signal has been processed in the interframe prediction encoding mode), the above predicted residual signal output by the IDCT circuit 24 is supplied to the arithmetic operation circuit 25, and the arithmetic operation circuit 25 generates a reproduced image signal by adding together the predicted residual signal and a predicted reference image signal supplied from a motion compensation circuit 27. In the above operation, the motion compensation circuit 27 generates the predicted reference image signal by processing the motion vector supplied from the variable-length decoder 22 in the prediction mode specified by the data also supplied from the variable-length decoder 22. In the case where the given data is a P-picture, the decoded image signal output by the arithmetic operation circuit 5 is stored in the frame memory 6 so that it can be used as a reference image signal in processing a subsequent image signal to be decoded.
In the MPEG standard, various profiles at various levels are also defined, and various tools are available. For example, scalability is available as one of these tools. The scalability of the MPEG encoding technique makes it possible to encode various image signals having different image sizes at various frame rates. For example, when only a lower-layer bit stream is decoded in a spatially-scalable fashion, an image signal having a small image size may be decoded, while an image signal having a large image size may be decoded if both lower-layer and higher-layer bit streams are decoded.
With reference to FIG. 3, an example of an encoder having the spatial scalability will be described below. In the spatial scaling, an image signal having a small image size is given as a lower layer signal, while an image signal having a large image size is given as a higher layer signal. The input image in the lower layer is first stored in a frame memory 101, and then is encoded in a manner similar to the MP@ML signal described above except that the output signal (the predicted image signal) provided by an arithmetic operation circuit 110 is supplied not only to a frame memory 111 so that it is used as a predicted reference image signal in the lower layer, but also to an image signal expansion circuit 113. The image signal expansion circuit 113 expands the predicted reference image signal supplied from the arithmetic operation circuit 110 up to an image size equal to the image size in the higher layer so that it is used as a predicted reference image signal in the higher layer. The other parts are the same as those of the encoder shown in FIG. 1, and will not be described here in further detail.
On the other hand, the input image signal in the higher layer is first stored in a frame memory 115. A motion vector extraction circuit 116 extracts a higher-layer motion vector and determines a prediction mode, in a manner similar to the operation according to the MP@ML. A motion compensation circuit 126 generates a predicted reference image signal using the higher-layer motion vector in the prediction mode determined by the motion vector extraction circuit 116. The resultant signal is supplied as a predicted higher-layer reference image signal to a weighting circuit 127. The weighting circuit 127 multiplies the predicted higher-layer reference image signal by a weighting factor W. The weighted predicted higher-layer reference image signal is then supplied to an arithmetic operation circuit 128.
On the other hand, as described above, the predicted image signal from the arithmetic operation circuit 110 in the lower layer is supplied to the frame memory 111 and the image signal expansion circuit 113. The image signal expansion circuit 113 expands the predicted image signal generated by the arithmetic operation circuit 110 up to a size equal to that of the image in the higher layer. The expanded image signal is supplied as a predicted lower-layer reference image signal to a weighting circuit 114. The weighting circuit 114 multiplies the predicted lower-layer reference signal by a weighting factor W. The weighted value of the predicted lower-layer reference signal is then supplied to the arithmetic operation circuit 128. The arithmetic operation circuit 128 generates a predicted reference image signal by adding together the weighted value of the predicted higher-layer reference image signal and the weighted value of the predicted lower-layer reference image signal. The obtained signal is supplied to an arithmetic operation circuit 117 so that it is used as a predicted reference frame for the image signal in the higher layer. The arithmetic operation circuit 117 calculates the difference between the image signal to be encoded and the predicted reference image signal supplied from the arithmetic operation circuit 128, and outputs the result as a predicted residual signal. However, in the case where the macroblock is to be processed in the intraframe prediction encoding mode, the arithmetic operation circuit 117 directly supplies the image signal to be encoded to a DCT circuit 118 without performing any operation.
The DCT circuit 118 performs a DCT (discrete cosine transform) operation on the output signal of the arithmetic operation circuit 117 in units of macroblocks thereby generating DCT coefficients (wherein the output signal of the arithmetic operation circuit 117 is given as a predicted residual signal when the signal is to be processed in the interframe prediction encoding mode while it is given as an input image signal when the signal is to be processed in the intraframe prediction encoding mode). The generated DCT coefficients are supplied to a quantization circuit 119. The quantization circuit 119 quantizes the DCT coefficients, as in the operation for the MP@ML data, using a quantization scale determined in accordance with the amount of data stored in a transmission buffer 121. The quantized DCT coefficients are supplied to a variable-length encoder 120. The variable-length encoder 120 performs a variable-length encoding operation on the quantized DCT coefficients, and outputs the resultant encoded data as a higher-layer bit stream via the transmission buffer 121.
The quantized DCT coefficients from the quantization circuit 119 are supplied to an inverse quantization circuit 122. The inverse quantization circuit 122 performs an inverse quantization operation on the received signal using the same quantization scale as that employed by the quantization circuit 119 thereby generating DCT coefficients corresponding to the output signal of the DCT circuit 118. The generated DCT coefficients are supplied to an inverse DCT circuit 123. The inverse DCT circuit 123 performs an inverse DCT operation on the DCT coefficients thereby generating an image signal (in the case of the intraframe prediction encoding mode) or a predicted residual signal (in the case of the interframe prediction encoding mode) corresponding to the output signal of the arithmetic operation circuit 117. The resultant signal is supplied to an arithmetic operation circuit 124. The arithmetic operation circuit 124 generates a predicted image signal by adding together the predicted reference image signal supplied from the arithmetic operation circuit 128 and the predicted residual signal supplied from the inverse DCT circuit 123. The resultant predicted image signal is stored in a frame memory 125. In the case where the signal supplied from the arithmetic operation circuit 123 is an image signal (that is, in the case where the signal is to be processed in the intraframe prediction encoding mode), the image signal from the arithmetic operation circuit 123 is directly supplied as a predicted image signal via the arithmetic operation circuit 124 to the frame memory 125.
The variable-length encoder 120 also receives the higher-layer motion vector extracted by the motion vector extraction circuit 116 and the information indicating the associated prediction mode, the quantization scale employed by the quantization circuit 119, and the weighting factor W used by the weighting circuits 114 and 127. These data are encoded into variable-length codes, and output as an encoded data from the variable-length encoder 120.
Now referring to FIG. 4, an example of a decoder having the capability of spatial scaling will be described below. The lower-layer bit stream input to a reception buffer 201 is decoded in a similar manner to the MP@ML signal described above except that the output signal of an arithmetic operation circuit 205, that is a decoded image signal in the lower layer, is supplied not only to the outside and to a frame memory 206 so that the signal stored in the frame memory 206 can be used as a predicted reference image signal in processing a subsequent image signal to be decoded (for an I- or P-picture signal), but also to an image signal expansion circuit 208.
On the other hand, the higher-layer bit stream is stored in a reception buffer 209, and then supplied as an encoded data to a variable-length decoder 210. The variable-length decoder 210 performs a variable-length decoding operation on the received data thereby generating quantized DCT coefficients, a quantization scale, a higher-layer motion vector, prediction mode data, and a weighting factor W. The quantized DCT coefficients and the quantization scale decoded by the variable-length decoder 210 are supplied to an inverse quantization circuit 211. The inverse quantization circuit 211 performs an inverse quantization operation on the quantized DCT coefficients using the quantization scale thereby generating DCT coefficients corresponding to the output signal of the DCT circuit 118 of the encoder. The resultant DCT coefficients are supplied to an inverse DCT circuit 212. The inverse DCT circuit 212 performs an inverse DCT operation on the received DCT coefficients thereby generating an image signal (in the case of the intraframe prediction encoding mode) or a predicted residual signal (in the case of the interframe prediction encoding-mode). The generated signal is supplied to an arithmetic operation circuit 213.
The higher-layer motion vector and the associated prediction mode data decoded by the variable-length decoder 210 are supplied to a motion compensation circuit 215. The motion compensation circuit 215 compensates for the motion of the predicted image signal stored in the frame memory 214 using the higher-layer motion vector in the specified prediction mode thereby generating a predicted reference image signal. The resultant predicted higher-layer reference image signal is supplied to a weighting circuit 216. The weighting circuit also receives the weighting factor W decoded by the variable-length decoder 210. The weighting circuit multiplies the predicted higher-layer reference image signal by the weighting factor W. The weighted value of the predicted higher-layer reference image signal is supplied to an arithmetic operation circuit 217.
The decoded image signal output by the arithmetic operation circuit 205 is output as a reproduced lower-layer image signal and also supplied to the frame memory 206. Furthermore, the decoded image signal is also supplied to the image signal expansion circuit 218. The image signal expansion circuit 218 expands the decoded image signal supplied from the lower-layer circuit to the same size as that of the higher-layer image signal. The expanded image signal is then supplied to the weighting circuit 208 so that the expanded imaged signal can be used as a predicted reference image signal in the higher layer. The weighting circuit 208 also receives the weighting factor W decoded by the variable-length decoder 210. The weighting circuit 208 multiplies the expanded image signal supplied from the image signal expansion circuit 7 by the weighting factor (1-W). The result is supplied as a weighted value of the predicted lower-layer reference image signal to an arithmetic operation circuit 217.
The arithmetic operation circuit 217 generates a predicted reference image signal by adding together the weighted value of the predicted higher-layer reference image signal and the weighted value of the predicted lower-layer reference image signal. The obtained signal is supplied to an arithmetic operation circuit 213. The arithmetic operation circuit 213 adds the predicted residual signal supplied from the inverse DCT circuit 212 and the predicted reference image signal supplied from the arithmetic operation circuit 217, thereby generating a reproduced higher-layer image, which is supplied not only to the outside but also to a frame memory 214. The reproduced higher-layer image signal stored in the frame memory 214 is used as a predicted reference image signal in a later process to decode a subsequent image signal. If the output signal of the inverse DCT circuit 212 is an image signal (this is the case when the signal is an intraframe-prediction-encoded signal), the image signal is directly output from the arithmetic operation signal 213 as a reproduced higher-layer image signal.
Although in the above description the operation of dealing with a luminance signal is described, the operation for a color difference signal is also performed in a similar manner except that the motion vector used for the luminance signal is reduced to half in both vertical and horizontal directions.
In addition to the MPEG standard, there are various standards for converting a moving image signal into a compressed code in a highly efficient manner. For example, the H.261 and H.263 standards established by the ITU-T are employed in encoding process especially for communication. Although there are some differences in the details associated with for example header information, the H.261 and H.263 standards are also based on the combination of motion compensation prediction encoding and DCT encoding, and thus an encoder and a decoder can be implemented in a simlar manner to those described above.
It is also known in the art to compose an image by combining a plurality of images. An example of a conventional image composing system will be described below. In this technique, an image of an object (for example a human figure) is taken in front of a background having a particular uniform color such as blue. Areas having colors other than blue (image area of the human figure) are extracted from the image, and the extracted image of the human figure is combined with another image (for example another background image). In the above process, the signal representing the extracted areas is referred to as the key signal.
FIG. 5 illustrates the principle of a technique of encoding a composite image signal generated in the above-described manner. In FIG. 5, a background image F1 and a foreground image F2 are combined into a single image. The foreground image F2 is obtained by taking a picture of an object in front of a background having a particular color, and then extracting the areas having colors different from the background color. The extracted areas are represented by a key signal K1. A composite image F3 is obtained by combining the foreground image F2 and the background image F1 using the key signal K1. Then the composite image F3 is encoded according to an appropriate encoding technique such as the MPEG encoding technique. At the stage of the encoding of the composite image, the information of the key signal has already been lost. Therefore, when the decoded composite image is edited, as is the case where only the background image F1 is changed while maintaining the foreground image F2 unchanged, it is required to extract the foreground image from the composite image using a chromakey, and then combine the extracted foreground image with another background image. However, the above recomposition and the associated editing process are generally difficult. Furthermore, if a composite image signal is encoded in a scaled fashion, similar difficulty will be encountered when the composite image signal is decoded and edited.