1. Field of the Invention
The present invention relates to an image processing apparatus and method of efficiently encoding image data and decoding the encoded data.
2. Related Background Art
H.261, MPEG-1, and MPEG-2 are conventionally known as moving image coding schemes and internationally standardized by ITU (International Telecommunication Union) or ISO (International Organization for Standardization). These H.261, MPEG-1, and MPEG-2 are put in writing as H.264 recommendations, ISO11172, and ISO13818, respectively. Motion JPEG (Joint Photographic Coding Experts Group) coding is also known which encodes each frame by adapting still image coding (e.g., JPEG coding) to the frame.
An encoding system which encodes a video signal by MPEG-1 will be described below with reference to FIG. 1.
Referring to FIG. 1, a video signal supplied from a TV camera 1001 is input from an input terminal 1003 to a moving image encoding apparatus 1002.
An A/D converter 1004 converts the input video signal from the input terminal 1003 into a digital signal and inputs the signal to a block forming unit 1005.
The block forming unit 1005 forms a macro block composed of 16xc3x9716 pixels sequentially from the upper left pixel to the lower right pixel.
MPEG-1 can encode image data by three encoding modes: an I-frame mode (to be referred to as an I-frame hereinafter) for performing intra-frame encoding, a P-frame mode (to be referred to as a P-frame hereinafter) for performing inter-frame encoding from past frames, and a B-frame mode (to be referred to as a B-frame hereinafter) for performing inter-frame encoding from past and future frames.
A frame mode unit 1017 selects one of these three frame modes. A frame mode is determined by taking account of the bit rate of encoding, prevention of image quality deterioration caused by accumulation of operation errors in DCT (Discrete Cosine Transform), image editing, and scene changes.
A process of encoding an I-frame will be described first.
For an I-frame, a motion compensator 1006 does not operate and outputs xe2x80x9c0xe2x80x9d. A subtracter 1007 subtracts the output from the motion compensator 1006 from the output from the block forming unit 1005 and supplies the difference to a DCT unit 1008.
The DCT unit 1008 performs DCT for the difference data supplied from the subtracter 1007 in units of blocks of 8xc3x978 pixels and supplies the transformed data to a quantizer 1009.
The quantizer 1009 quantizes the transformed data from the DCT unit 1008 and supplies the quantized data to an encoder 1010.
The encoder 1010 one-dimensionally rearranges the quantized data from the quantizer 1009, determines codes by the 0-run length and value, and supplies the encoded data to an output terminal 1011.
The quantized data from the quantizer 1009 is also supplied to an inverse quantizer 1012. The inverse quantizer 1012 inversely quantizes the supplied quantized data and supplies the inversely quantized data to an inverse DCT unit 1013. The inverse DCT unit 1013 performs inverse DCT for the inversely quantized data and supplies the inversely transformed data to an adder 1014. The adder 1014 adds the output xe2x80x9c0xe2x80x9d from the motion compensator 1006 and the output from the inverse DCT unit 1013 and stores the sum in a frame memory 1015 or 1016.
A process of encoding a P-frame will be described next.
For a P-frame, the motion compensator 1006 operates, and an output from the block forming unit 1005 is input to the motion compensator 1006. An image of a temporally immediately preceding frame is also input to the motion compensator 1006 from the frame memory 1015 or 1016. The motion compensator 1006 performs motion compensation by using the input image data and outputs a motion vector and a predictive macro block.
The subtracter 1007 calculates the difference between the output from the block forming unit 1005 and the predictive macro block. This difference is subjected to DCT and quantization. The encoder 1010 determines codes on the basis of the quantized data and the motion vector and outputs the codes from the terminal 1011.
The quantized data from the quantizer 1009 is also supplied to the inverse quantizer 1012. The inverse quantizer 1012 inversely quantizes the supplied quantized data and supplies the inversely quantized data to the inverse DCT unit 1013. The inverse DCT unit 1013 performs inverse DCT for the inversely quantized data and supplies the inversely transformed data to the adder 1014. The adder 1014 adds the output from the inverse DCT unit 1013 and the output predictive macro block data from the motion compensator and stores the sum in a frame memory 1015 or 1016.
A process of encoding a B-frame is as follows.
Although motion compensation is performed for this B-frame as for a P-frame, the motion compensator 1006 performs this motion compensation by using data from both the frame memories 1015 and 1016, and forms and encodes a predictive macro block.
In the methods by which an entire image is encoded as described above, however, a background image with no motion must be repeatedly transmitted, and this wastes the code length. For example, in images in a video telephone system or video conference, only objects actually moving are persons, and the background remains stationary. In an I-frame which is transmitted for each fixed time, a background image with no motion is also transmitted to produce useless codes (code data of the background image).
FIG. 2 shows an image in a video conference or the like.
Referring to FIG. 2, a person 1050 faces a television camera in a video conference room. This person 1050 and a background 1051 are encoded in the same frame by the same encoding method.
Since the background 1051 remains still, almost no codes are produced if motion compensation is performed, but a large number of codes are produced in an I-frame.
Consequently, even for a portion with no motion, large encoded data is repeatedly and uselessly transmitted. Also, if the motion of the person 1050 is large and a large number of codes are generated by encoding, no enough code amount can be obtained by an I-frame encoding process performed after that. If this is the case, a quantization coefficient must be set for coarse quantization, and this undesirably deteriorates even the image quality of the background with no motion. Note that a moving object like the person 1050 described above will be called a subject hereinafter.
The present invention has been made in consideration of the above situation, and has as its object to provide an image processing apparatus and method of efficiently encoding input image data and decoding the encoded data.
To achieve the above object, according to one preferred aspect of the present invention, in an image processing apparatus and method, a plurality of objects are separated from input moving image data, a separated first object is encoded by a first encoding method, a separated second object is encoded by a second encoding method, and the encoding process for the second object is controlled in accordance with encoded data of the first object.
According to another preferred aspect of the present invention, in an image processing apparatus and method, a plurality of objects are separated from input moving image data, a separated first object is encoded by a first encoding method, a separated second object is encoded by a second encoding method, and the encoding process for the second object is controlled in accordance with a recording capacity of a recording medium for recording encoded data of the first object and encoded data of the second object.
According to still another preferred aspect of the present invention, in an image processing apparatus and method, a plurality of objects are separated from input moving image data, a separated first object is encoded by a first encoding method, a separated second object is encoded by a second encoding method, and the encoding process for the second object is controlled in accordance with a communication data rate at which encoded data of the first object and encoded data of the second object are communicated to an external apparatus.
According to still another preferred aspect of the present invention, in an image decoding apparatus and method of decoding synthetic encoded data obtained by separating first and second objects from input moving image data, encoding the first object by a first encoding method, encoding the second object by a second encoding method while a code amount of encoded data of the second object is controlled in accordance with encoded data of the first object, synthesizing the encoded data of the first object and the encoded data of the second object, and transmitting synthetic data, the synthetic encoded data is separated into the encoded data of the first object and the encoded data of the second object, the encoded data of the separated first object is decoded, and the encoded data of the separated second object is decoded.
According to still another preferred aspect of the present invention, there is provided an image decoding apparatus and method of decoding synthetic encoded data obtained by separating first and second objects from input moving image data, encoding the first object by a first encoding method, encoding the second object by a second encoding method while a code amount of encoded data of the second object is controlled in accordance with a communication rate, synthesizing encoded data of the first object and the encoded data of the second object, and communicating synthetic data, the synthetic encoded data is separated into the encoded data of the first object and the encoded data of the second object, the encoded data of the separated first object is decoded, and the encoded data of the separated second object is decoded.
According to still another preferred aspect of the present invention, in an image decoding apparatus and method of decoding synthetic encoded data obtained by separating first and second objects from input moving image data, encoding the first object by a first encoding method, encoding the second object by a second encoding method while a code amount of encoded data of the second object is controlled in accordance with a frame rate of the moving image data, synthesizing encoded data of the first object and the encoded data of the second object, and communicating synthetic data, the synthetic encoded data is separated into the encoded data of the first object and the encoded data of the second object, the encoded data of the separated first object is decoded, and the encoded data of the separated second object is decoded.
Other objects, features and advantages of the invention will become apparent from the following detailed description taken in conjunction with the accompanying drawings.