In recent years, MPEG-4 has been internationally standardized as an encoding scheme of a moving image. In a conventional moving image encoding scheme represented by MPEG-2, encoding is done for respective rectangular frames or fields. By contrast, MPEG-4 can encode using image data with an arbitrary shape as an object. Details of MPEG-4 are described in Sukeichi Miki, “All About MPEG-4”, Kogyo Chosakai Publishing Co., Ltd., the International standard ISO/IEC14496-2, and the like.
That is, a technique for extracting an objective region with an arbitrary shape is indispensable for recent moving image encoding.
As an extraction method of an objective region, a method of extracting an objective region from the difference between a stored background image and input image is known. For example, an example of such method is described in, e.g., Japanese Patent Laid-Open No. 5-334441 “Moving Object Extraction Apparatus”, and the like. FIG. 10 is a block diagram showing the arrangement of a conventional image processing apparatus for extracting an objective region from the difference between the stored background image and input image.
Referring to FIG. 10, an input unit 1001 is an image sensing device such as a camera or the like, which senses a scene or the like that includes a target object. A moving image sensed without any object is input from the input unit 1001 to a background image generation unit 1002, which generates a background image by calculating the average of a plurality of frame images that forms the moving image. The generated background image is stored in a background image storage unit 1003.
An image difference unit 1004 calculates differences between an image sensed by the input unit 1001 and the background image stored in the background image storage unit 1003 for respective pixels. Respective pixel values of the generated differential image are compared with an arbitrary threshold value T. If the absolute value of a given pixel value of the differential image is larger than the threshold value T, that pixel is set to be “1”; otherwise, it is set to be “0”. In this manner, a region with pixel values=1 in the generated image serves as mask information indicating the objective region. An object extraction unit 1005 extracts an object from the sensed image in accordance with this mask information.
The principle of object extraction will be described in detail below. Let Pc(x, y) be the pixel value of an input image at a point of a coordinate position (x, y) on an image plane, and Pb(x, y) be the pixel value of the background image at that point. At this time, the difference between Pc(x, y) and Pb(x, y) is calculated, and its absolute value is compared with a given threshold value Th.
For example, a discrimination formula is described by:|Pc(x, y)−Pb(x, y)|≦Th  (1)
If the difference absolute value is equal to or smaller than the threshold value Th in formula (2) above, since this means that the difference between Pc and Pb is small, Pc is determined to be a background pixel. On the other hand, if the difference absolute value is larger than the threshold value Th, Pc is determined to be a pixel of an object to be detected. By making the aforementioned discrimination at all points on a frame, detection for one frame is completed.
This process will be described using an example of display images. FIG. 23A shows an example of the background image, and FIG. 23B shows a sensed image which is being monitored. When the value at a given point P1b on the background image is compared with that at a point P1c on the monitor frame at the same position, the difference absolute value is equal or nearly equal to zero, formula (1) holds, and it is determined that the pixel at the position P1c is a background pixel.
On the other hand, if the value of another point P2b is compared with that of a point P2c, since the difference absolute value becomes large, it is determined that the pixel at that position is not a background pixel, i.e., it is an object pixel.
FIG. 23C shows a result obtained after the aforementioned process is executed for all points on the sensed image, while a pixel which is determined to be an object pixel is defined as “1”, and a pixel which is determined to be a background pixel is defined to be “0”. In FIG. 23C, a black portion indicates the background, and a white portion indicates an object.
However, when the input unit 1001 has an automatic focus adjustment function for the purpose of improving, e.g., the image quality of a moving image, if an object is located near the camera, the focal point position upon sensing only the background is different from that upon sensing an image including the object. Therefore, the conventional image processing apparatus shown in FIG. 10 cannot normally extract the object.
FIGS. 11A and 11B show an example of an image taken upon sensing the background and that taken upon sensing an object using a camera with the automatic focus adjustment function. FIG. 11A shows an image generated as a background image by sensing only the background. In this case, the focal point of the camera matches an instrument at the center of the frame. On the other hand, FIG. 11B shows a scene in which a person stands near the camera in the state in which the image shown in FIG. 11A is sensed. In this case, since the focal point of the camera matches the person, the background is out of focus.
FIG. 12 shows a differential image generated from the two images shown in FIGS. 11A and 11B. In FIG. 12, each black component indicates that the difference between the two images is zero, and a component closer to a white component indicates that a larger difference is generated between the two images. In the differential image shown in FIG. 12, since the background other than the person is out of focus, a difference is generated on the entire image. Hence, it is difficult to extract only an object from that image.
On the other hand, when the input unit 1001 has an automatic exposure adjustment function for the purpose of improving the image quality of a moving image, the aperture value changes in correspondence with the brightness of an object, and the conventional image processing apparatus shown in FIG. 10 cannot normally extract an object.
FIG. 34A shows the brightness when the background image undergoes automatic exposure correction. FIG. 34B shows an example wherein an incoming object is darker than the background. In order to increase the brightness of the incoming object, that of the background portion also increases. FIG. 34C shows an example wherein an incoming object is brighter than the background. In order to decrease the brightness of the incoming object, that of the background portion also decreases.
In this manner, an actual background image is different from that which has been explained using FIG. 34B, and even when difference absolute values between background portions are calculated, large differences are generated, resulting in a determination error indicating that the image of interest is not a background. In other words, it becomes difficult to extract a specific object from image data that has undergone the automatic exposure adjustment process.