1. Field of the Invention
The present invention relates to image processing apparatus and method, as well as a storage medium. More particularly, the invention relates to image processing apparatus and method for encoding plural objects of arbitrary shapes included in a moving image, as well as a storage medium used therein.
2. Related Background Art
Attention has recently been paid to a processing of separating and combining image data object by object. Particularly, as a moving image encoding method, an MPEG-4 encoding method is being standardized. According to the MPEG-4 encoding method, it is possible to effect encoding and decoding on an object basis. Hence, various applications so far difficult such as distribution of data in accordance with transmission lines and image re-processing and improvement of the encoding efficiency are expected.
Data of an object handled by the MPEG-4 method is constituted by not only such image data themselves as luminance (Y) data and color difference (chroma) data but also data representing the shape of object (shape data) and α data representing the transparency of object. However, the α data are omitted if there is no translucent state of object. Explanation of the α data will be omitted in the following description.
The basis of object encoding will be described below with reference to FIGS. 1 to 8.
Such an image as shown in FIG. 1 is here assumed. The image is composed of three objects which are a background, a person, and a rocket. As shown in FIGS. 2A to 2C, if the image shown in FIG. 1 is divided on the object basis, there are obtained three objects of a background (FIG. 2A), a person (FIG. 2B), and a rocket (FIG. 2C). These objects are encoded each independently, and then multiplexed.
FIG. 3 is a block diagram showing a schematic configuration of a conventional image encoding apparatus which performs encoding object by object.
In a conventional image encoding circuit are included, for each object, an encoding circuit and a generated code amount controlling circuit. In the example shown in FIG. 1, encoding circuits 110a to 110c, buffers 112a to 112c for temporarily storing output codes of the encoding circuits 110a to 110c, and code amount controlling circuits 114a to 114c which control output code amounts of the encoding circuits 110a to 110c in accordance with stored code amounts in the buffers 112a to 112c, are provided for the respective objects. Further, a multiplexing circuit 116 multiplexes outputs of the buffers 112a to 112c and outputs the thus-multiplexed data.
Next, a detailed description will be given below about the configuration of encoding circuits for a background image and objects of arbitrary shapes.
Reference will first be made to an encoding process for the background image. Image data of the background image is inputted to the image encoding circuit 110a. Encoding of the background image can be executed as a special type of the arbitrarily-shaped object encoding. This processing is the same as the conventional frame processing because the background image is a frame-size image. Therefore, it may be only image data that is to be inputted to the encoding circuit 110a, while shape data may be omitted.
First, a screen is divided into macrosize blocks and an encoding process is executed for each macroblock. FIG. 4 shows in what state the background image is divided into macroblocks. The macroblock comprises six blocks, as shown in FIG. 5.
FIG. 6 is a block diagram showing the configuration of an encoding circuit which executes an encoding process for each macroblock.
In the same figure, in a subtracter 120, inputted present image data (luminance-color difference data) are outputted as they are to a discrete cosine transform (DCT) circuit 122 in case of intra-frame encoding, while in case of inter-frame predictive encoding, a predictive value is substracted from inputted present image data and the result obtained is outputted to the DCT circuit 122.
The DCT circuit 122 performs a discrete cosine transform for the image data (or image difference data) from the subtracter 120 on a macroblock basis. A quantization circuit 124 quantizes a DCT coefficient outputted from the DCT circuit 122 and supplies the thus-quantized image data to both an inverse quantization circuit 126 and a variable length encoding circuit 140.
The inverse quantization circuit 126 inverse-quantizes the output of the quantization circuit 124, while the inverse DCT circuit 128 performs an inverse discrete cosine transform for the output of the inverse quantization circuit 126. An adder 130 sends output data of the inverse DCT circuit 128 as it is to a memory 132 if the output data is an image data of an intra-frame-encoded frame, while if the output data of the inverse DCT circuit 128 is an image difference data of an inter-frame-encoded frame, the adder 130 adds a predictive value thereto and then outputs the result obtained to the memory 132. The memory 132 stores image data of one or plural frames serving as predictive frames in inter-frame encoding.
A motion detection circuit 134 detects motion on the macroblock basis from inputted image data of series of frames. As motion detecting modes there are a mode (P frame) in which prediction is made from only the image preceding to the image to be encoded and a mode (B frame) in which prediction is made from both images respectively preceding to and suceeding to the image to be encoded. Usually, in the case of color difference (Cb, Cr) data, there is used a motion vector obtained from luminance (Y) data.
A motion compensation circuit 136 compensates the predictive frame image data from the memory 132 in terms of the motion vector provided from the motion detection circuit 134 and provides the thus-compensated data to both subtracter 120 and adder 130 as a predictive value in inter-frame predictive encoding. A motion vector prediction circuit 138 predicts the motion detected by the motion detection circuit 134 and sends a predictive value of the motion vector to the variable length encoding circuit 140.
The variable length encoding circuit 140 performs a variable length encoding for the output data of both quantization circuit 124 and motion vector prediction circuit 138 and outputs the thus-encoded data.
Referring back to FIG. 3, the buffers 112a to 112c temporarily store the encoded data of image and motion vector which are outputted from the encoding circuits 110a to 110c. The code amount controlling circuits 114a to 114c check duty or residual capacities of the buffers 112a to 112c and control the quantization step size of the quantization circuit 124 in the encoding circuits 110a to 110c so that the generated code amount may be within the target code amount.
Both image data and shape data are needed for the person (FIG. 2B) and the rocket (FIG. 2C). FIGS. 7A and 7B show shape data of the person and the rocket, respectively. Each shape data is defined by a rectangular area called a bounding box which includes the object concerned. Rectangles 142 and 144 shown in FIGS. 8A and 8B, respectively, are bounding boxes. Also when handling an image of an arbitrary shape, the encoding process is executed in the unit of a macroblock and therefore the bounding box is an integer multiple of the macroblock.
FIGS. 9A and 9B show how the interior of the bounding box is divided in macroblocks. Data in the bounding box is a binary data indicating the interior of the object concerned and the exterior thereof.
Image data, like the shape data, is also encoded in the bounding box size. Rectangles 146 and 148 shown in FIGS. 10A and 10B represent bounding boxes of image data of the person and the rocket, respectively. Since the image data is a multi-value data of 8 bits, a processing called padding is applied to the exterior of the object concerned. The padding is a processing for preventing a lowering of the encoding efficiency caused by a discontinuous object boundary.
FIGS. 11A and 11B show an example of dividing image data of the person and the rocket respectively into macroblocks.
FIG. 12 is a block diagram showing a schematic configuration of the encoding circuits 110a to 110c. Processing for image data is the same as in FIG. 6 and components of the same functions as in FIG. 6 are identified by the same reference numerals as in FIG. 6.
Intra-frame encoding is called I-VOP (Intra-Video Object Plane), a forward prediction processing in inter-frame encoding is called P-VOP, and a bidirectional prediction processing in intra-frame encoding is called B-VOP.
A shape encoding circuit 150 performs a predictive encoding for shape data. An output code of the shape encoding circuit 150 is fed to both a memory 152 and a variable length encoding circuit 158. The memory 152, which functions as delay means, provides stored data to a motion compensation circuit 156.
A motion detection circuit 154 detects a motion from both image data and shape data and sends the result of the detection to the motion compensation circuit 136, motion vector prediction circuit 138 and motion compensation circuit 156. In accordance with a motion vector provided from the motion detection circuit 154 the motion compensation circuit 156 performs a motion compensation for the data provided from the memory 152 and sends the thus-compensated data to the shape encoding circuit 150, which in turn performs a predictive encoding for the inputted shape data in accordance with a motion compensation predictive value provided from the motion compensation circuit 156.
The variable length encoding circuit 158 performs a variable length encoding for the encoded image data provided from the quantization circuit 124, motion vector information from the motion vector prediction circuit 138, and encoded shape data from the shape encoding circuit 150.
Turning back to FIG. 3, the data encoded in the encoding circuits 110a to 110c are temporarily stored in the buffers 112a to 112c, respectively. Since the generated code amount varies with the lapse of time, it is necessary to establish a certain period and keep the code amount constant within the period. The code amount control circuits 114a to 114c check residual capacities (or stored data volumes) of the buffers 112a to 112c, respectively, and then control the quantization step size of the quantization circuit 124 so that the values obtained become predetermined values. In this way the generated code amount is controlled so as to be converged to a target code amount.
The multiplexing circuit 116 multiplexes data provided from the buffers 112a to 112c and outputs them together as a single stream. Although only video data is illustrated in FIG. 3, the multiplexing operation also covers audio data and scene description data of a combined image.
In the prior art it is necessary to pre-set a target code amount for each object and it is impossible or difficult to set an optimum code amount for each object relative to a target code amount in the entire system. For example, if a target code amount of the person is set low and if target code amounts of the rocket and the background are high, only the image of the person will blur. If a lot of codes are allocated to the person and the rocket, the image quality of the background will be deteriorated. These points must be taken into account so as to give a well-balanced state in setting a target code amount of each object, but this has so far been very difficult.
Besides, the size of each object changes with the lapse of time, but according to the prior art it has been next to impossible to properly follow up such changes with time and control the entire code amount.