1. Field of the Invention
The present invention relates to an object detecting system based on multiple-eye images.
2. Description of Related Art
A technology for extracting only a target object from an image signal (called a "segmentation") is now being researched for application to an image interpretation, an image edition in units of an extracted object, and an image data base.
A first clue for this purpose is a so called image region segmentation scheme. Particularly, a scheme exploiting a clustering technique not only gives a good region segmentation result, but also is resistant to noises. For example, reference can be made to a five-dimensional clustering technique proposed by N. Izumi, H, Morikawa and H. Harashima, "Combining Color and Spatial Information for Segmentation", 1991 Spring Meeting of Japan Society of Electronics, Information and Communication, D-680 (1991). This proposed clustering technique takes into account the five-dimensional space including the tristimulus values R, G and B as a feature amount space and a pixel position (x, y). Here, assume that a color information of an attentive pixel is (R, G, B), a position information of the attentive pixel is (x, y), a mean value of a color information of the cluster at an index "n" is (Rn,Gn,Bn), and a positional gravity of the cluster at the index "n" is xn,yn). Under this assumption, a distance "d.sub.n " between the attentive pixel and the "n"th cluster is defined as follows: EQU d.sub.n.sup.2 =w.sub.0 *(R-Rn).sup.2 +(G-Gn).sup.2 +(B-Bn).sup.2 !+w.sub.1 *(x-xn).sup.2 +(y-yn).sup.2 ! (1)
where w.sub.0 and w.sub.1 are weighting coefficients, respectively.
Furthermore, the distance d.sub.n is calculated for a plurality of clusters in the neighborhood of the attentive pixel, and it is concluded that the attentive pixel belongs to the cluster "n" having a minimum distance d.sub.n. It is reported that, with this arrangement, it is possible to realize the region segmentation of the image signals without losing a local information and with a high resistance to noises. An object detection can be realized by collecting statistically similar clusters to generate an integrated cluster. However, this approach cannot avoid a considerable amount of erroneous detection because color information of only one image is used.
Furthermore, Y. Yokoyama and Y. Miyamoto, "Image segmentation for video coding using motion information", 1994 Autumn Meeting of Japan Society of Electronics, Information and Communication, D-150 (1994), reports an extension of the above mentioned clustering algorithm to a moving picture. According to this report, a seven-dimensional feature amount space including three luminance and color difference signals Y, Cr and Cb, a pixel position (x, y) arid a two-dimensional displacement vector is considered. Here, if is assumed that the displacement vector derived for each pixel is (v.sub.x, v.sub.y), and the displacement vector derived for each cluster "n" is (v.sub.n,x, v.sub.n,y), a distance d.sub.n between the attentive pixel and the "n"th cluster is defined as follows: ##EQU1##
where w.sub.0, w.sub.1 and w.sub.2 are weighting coefficients, respectively.
Furthermore, the distance d.sub.n is calculated for a plurality of clusters in the neighborhood of the attentive pixel, and it is concluded that the attentive pixel belongs to the cluster "n" having a minimum distance d.sub.n. In this second proposal, the RGB space of the first proposal is changed to the YCrCb space, but this change does not have a substantial influence to the result of the region segmentation. It is reported that, with this arrangement, an encoding efficiency in the motion compensation prediction is improved in comparison with the clustering of the first proposal which does not use the displacement vector information. In addition, similar to the first proposal, the object detection can be realized by collecting statistically similar clusters to generate an integrated cluster. However, this second approach cannot also avoid a considerable amount of erroneous detection because the region segmentation does not become stable for variation in the motion information used.
A second clue for the object detection is a structure restoration scheme using focal information. For example, reference can be made to A. P. Pentland: "A New Sense for Depth of Field", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-9, No.4, pp523-531 (July 1987), the disclosure of which is incorporated in its entirety into this application. This second approach proposes to conduct an edge detection, to seek a brightness gradient in the neighborhood of the edge, and to obtain the depth of the object on the basis of the degree of the brightness gradient. However, this second approach still cannot solve the problem of erroneous detection, because (1) a satisfactory precision cannot be obtained concerning the degree of blurring in the neighborhood of the edge, (2) the depth cannot be sought in portions other than the edge, and (3) the detected edge does not necessarily constitute a closed region.
A third clue for the object detection is a structure restoration scheme using a stereo image. For example, reference can be made to U. R. Dhond and J. K. Aggarwal: "Structure from Stereo--A Review", IEEE Transactions on Systems, Man and Cybernetics. Vol 19, No. 6, ppl489-1510 (Nov./Dec., 1989), the disclosure of which is incorporated in its entirety into this application. This paper explains how to obtain the disparity information from images in stereo, by using a relaxation and a dynamic programming. In the third approach, however, since the disparity detection involves errors, the problem of erroneous detection still cannot be solved.
As another topic concerning the use of the focal information, Japanese Patent Application No. Heisei-7-249961 entitled "Three-Dimensional Image Coding System" and its corresponding U.S. patent application Ser. No. 08/720,378 filed on Sep. 27, 1996 now U.S. Pat. No. 5,696,551, the disclosure of which is incorporated in its entirety into this application, proposes a system of obtaining a sharpened image from a plurality of image having different focus positions, by utilizing the clustering technique. By using this system, it is possible to get a clue for the object detection, such as to which of focal images each pixel belongs, and at what degree of depth each pixel is positioned.
As still another topic concerning the use of the focal information, J. Katto and M. Ohta: "Three-Dimensional Picture Coding with Focusing", 1995 Image Coding Symposium (PCSJ95), Oct. 2, 1995, pp65-66, the disclosure of which is incorporated in its entirety into this application, and which will be referred to as "Literature 1" hereinafter, proposes a system of obtaining disparity information, by utilizing the clustering technique. By using this system, it is possible to get a clue for the object detection, such as to which of disparity ranges each pixel belongs, and at what degree of depth each pixel is positioned.
However, the existing object detecting systems based on image signals cannot sufficiently reduce tie erroneous detection.