The present invention relates to an image processing apparatus and method for synthesizing a plurality of images, and a computer-readable memory.
As conventional moving image encoding schemes, h.261, MPEG-1, MPEG-2, and the like are known. These encoding schemes are internationally standardized by ITU and ISO, and their documents are available as h.261 recommendations and ISO11172 and ISO13818. Also, Motion JPEG encoding that encodes a moving image by applying still image encoding (e.g., JPEG encoding) to the respective frames is known.
An encoding system that encodes a moving image based on a video signal by MPEG-1 will be explained below with reference to FIG. 27.
FIG. 27 shows the arrangement of a conventional encoding system.
A TV camera 1001 inputs a video signal to an input terminal 1003 of a moving image encoding apparatus 1002, and that video signal is output to an A/D converter 1004. The video signal converted into a digital signal by the A/D converter 1004 is input to a block former 1005 to form a macroblock constructed by 16xc3x9716 pixels in the order from the upper left corner to the lower right corner of an image based on the video signal. An MPEG-1 stream includes I-frame for intra-frame encoding, P-frame for inter-frame encoding using past frames, and B-frame for inter-frame encoding using past and future frames. A frame mode unit 1017 determines the modes of these frames. The frame mode is determined in consideration of the bit rate of encoding, prevention of deterioration of image quality due to accumulated DCT computation errors, editing of an image, and scene changes.
In I-frame, a motion compensator 1006 is inoperative, and outputs zero. A subtractor 1007 subtracts the output from the motion compensator 1006 from the output from the block former 1005, and inputs the difference to a DCT transformer 1008. The DCT transformer 1008 DCT-transforms the input signal in units of 8xc3x978 blocks, and the DCT-transformed signal is quantized by a quantizer 1009. The quantized signal is converted into a linear sequence by an encoder 1010, and codes are determined based on the zero-runlength and value of the signal. The encoded signal is output from a terminal 1011, and is recorded on a storage medium or is transmitted via a network, line, or the like. The output from the quantizer 1009 is dequantized by a dequantizer 1012, is inversely DCT-transformed by an inverse DCT transformer 1013, and is then added to the output from the motion compensator 1006 by an adder 1014. The sum signal is stored in a frame memory 1015 or 1016.
In P-frame, the motion compensator 1006 is operative, and the output from the block former 1005 is input to the motion compensator 1006, which performs motion compensation on the basis of the contents of the frame memory 1015 or 1016 which stores an image of an immediately preceding frame, and outputs a motion vector and predicted macroblocks. The subtractor 1007 calculates the difference between the input from the block former 1005 and the predicted macroblocks, and inputs the difference to the DCT transformer 1008. The DCT transformer 1008 DCT-transforms the input signal, and the DCT-transformed signal is quantized by the quantizer 1009. A code of the quantized signal is determined by the encoder 1010 on the basis of the motion vector, and is output from the terminal 1011. The output from the quantizer 1009 is dequantized by the dequantizer 1012, is inversely DCT-transformed by the inverse DCT transformer 1013, and is then added to the output from the motion compensator 1006 by the adder 1014. The sum signal is stored in the frame memory 1015 or 1016.
In B-frame, motion compensation is done as in P-frame. In this case, the motion compensator 1006 executes motion compensation based on the contents of both the frame memories 1015 and 1016 to generate predicted macroblocks, thus encoding a signal.
However, in the conventional method of encoding the entire image, a motionless image such as a background portion or the like must be repetitively transmitted, and the code length is wasted. For example, an object which is actually moving in a videophone, video meeting, or the like is only a person, and the background does not move. In I-frame which is sent at a given time interval, the motionless background image is also sent, thus wasting codes. FIG. 28 shows that example.
FIG. 28 shows a frame in which a person faces a television camera in a room. A person 1051 and background 1050 undergo identical encoding in a single frame. Since the background 1050 is motionless, nearly no codes are generated if motion compensation is done, but the background 1050 is encoded upon sending I-frame. For this reason, codes are repetitively and wastefully sent even for a motionless portion. In I-frame after the person 1051 has taken a large motion and a large code length has been generated upon encoding, a sufficiently large code length cannot be obtained. For this reason, in I-frame, coarse quantization coefficients must be set, and the image quality of even the motionless background deteriorates.
Hence, like MPEG-4, the background and object may be separately encoded to improve the encoding efficiency. In this case, since an object image sensed at another place can be synthesized, a frame may be formed by synthesizing another person 1052 to the frame shown in FIG. 28, as shown in FIG. 29.
However, the synthesized image (portion 1052) looks still unnatural due to color cast arising from the characteristics of an image sensing device, and the observer may find it incongruent. For example, when the image of the person 1052 is captured by a device that shows a green cast tendency, while the image of the person 1051 is captured by a device that shows a red cast tendency, color cast is conspicuous in an image obtained by synthesizing these two images, resulting in a very unnatural image.
Also, an image obtained by synthesizing images sensed with different contrasts caused by environmental differences such as illumination conditions and characteristics of image sensing devices looks unnatural, and the observer may find it incongruent. For example, when the image of the person 1052 is sensed under sunlight, while the image of the person 1051 is sensed under artificial light, the two images have a very large contrast difference, resulting in a very unnatural image.
The present invention has been made in consideration of the aforementioned problems, and has as its object to provide an image processing apparatus and method, which can easily synthesize a plurality of images and can generate a synthesized image with high image quality, and a computer-readable memory.
In order to achieve the above object, an image processing apparatus according to the present invention comprises the following arrangement.
That is, an image processing apparatus comprises:
first feature extraction means for extracting a first feature from first encoded data of a first image;
second feature extraction means for extracting a second feature from second encoded data of a second image;
first decoding means for obtaining a first reconstructed image by decoding the first encoded data;
second decoding means for obtaining a second reconstructed image by decoding the second encoded data;
correction means for correcting one of the first and second reconstructed images on the basis of the first and second features; and
synthesis means for synthesizing the first and second reconstructed images.
In order to achieve the above object, an image processing method according to the present invention comprises the following arrangement.
That is, an image processing method comprises:
the first feature extraction step of extracting a first feature from first encoded data of a first image;
the second feature extraction step of extracting a second feature from second encoded data of a second image;
the first decoding step of obtaining a first reconstructed image by decoding the first encoded data;
the second decoding step of obtaining a second reconstructed image by decoding the second encoded data;
the correction step of correcting one of the first and second reconstructed images on the basis of the first and second features; and
the synthesis step of synthesizing the first and second reconstructed images.
In order to achieve the above object, a computer-readable memory according to the present invention comprises the following arrangement.
That is, a computer-readable memory that stores program codes of image processing, has:
a program code of the first feature extraction step of extracting a first feature from first encoded data of a first image;
a program code of the second feature extraction step of extracting a second feature from second encoded data of a second image;
a program code of the first decoding step of obtaining a first reconstructed image by decoding the first encoded data;
a program code of the second decoding step of obtaining a second reconstructed image by decoding the second encoded data;
a program code of the correction step of correcting one of the first and second reconstructed images on the basis of the first and second features; and
a program code of the synthesis step of synthesizing the first and second reconstructed images.
In order to achieve the above object, an image processing apparatus according to the present invention comprises the following arrangement.
That is, an image processing apparatus comprises:
supply means for supplying first and second encoded image data to be synthesized;
adjustment means for adjusting a density or color of at least one of the first and second encoded image data supplied by the supply means; and
output means for outputting the first and second encoded image data adjusted by the adjustment means.
In order to achieve the above object, an image processing method according to the present invention comprises the following arrangement.
That is, an image processing method comprises:
the supply step of supplying first and second encoded image data to be synthesized;
the adjustment step of adjusting a density or color of at least one of the first and second encoded image data supplied in the supply step; and
the output step of outputting the first and second encoded image data adjusted in the adjustment step.
In order to achieve the above object, a computer-readable memory according to the present invention comprises the following arrangement.
That is, a computer-readable memory that stores program codes of image processing, has:
a program code of the supply step of supplying first and second encoded image data to be synthesized;
a program code of the adjustment step of adjusting a density or color of at least one of the first and second encoded image data supplied in the supply step; and
a program code of the output step of outputting the first and second encoded image data adjusted in the adjustment step.
In order to achieve the above object, an image processing apparatus according to the present invention comprises the following arrangement.
That is, an image processing apparatus for synthesizing a plurality of images, comprises:
background feature extraction means for extracting a background feature from encoded data of at least one background image;
object feature extraction means for extracting an object feature including statistic information of image information from encoded data of at least one object image;
background decoding means for generating a reconstructed background image by decoding the encoded data of the background image;
object decoding means for generating a reconstructed object image by decoding the encoded data of the object image;
correction means for correcting the reconstructed object image on the basis of the background and object features; and
synthesis means for synthesizing the reconstructed background image and the reconstructed object image corrected by the correction means.
In order to achieve the above object, an image processing method according to the present invention comprises the following arrangement.
That is, an image processing method for synthesizing a plurality of images, comprises:
the background feature extraction step of extracting a background feature from encoded data of at least one background image;
the object feature extraction step of extracting an object feature including statistic information of image information from encoded data of at least one object image;
the background decoding step of generating a reconstructed background image by decoding the encoded data of the background image;
the object decoding step of generating a reconstructed object image by decoding the encoded data of the object image;
the correction step of correcting the reconstructed object image on the basis of the background and object features; and
the synthesis step of synthesizing the reconstructed background image and the reconstructed object image corrected in the correction step.
In order to achieve the above object, a computer-readable memory according to the present invention comprises the following arrangement.
That is, a computer-readable memory that stores program codes of image processing for synthesizing a plurality of images, has:
a program code of the background feature extraction step of extracting a background feature from encoded data of at least one background image;
a program code of the object feature extraction step of extracting an object feature including statistic information of image information from encoded data of at least one object image;
a program code of the background decoding step of generating a reconstructed background image by decoding the encoded data of the background image;
a program code of the object decoding step of generating a reconstructed object image by decoding the encoded data of the object image;
a program code of the correction step of correcting the reconstructed object image on the basis of the background and object features; and
a program code of the synthesis step of synthesizing the reconstructed background image and the reconstructed object image corrected in the correction step.