Conventionally, in a monitoring system such as a security system, various sensors such as a monitor camera and an infrared sensor are used. By using the monitor camera and sensor, the presence or absence of an intruder in a building or the like can be easily monitored or detected from a remote place.
In recent years, by digitization of an image, image processing techniques have advanced dramatically. As a result, a specific portion in an image can be enhanced or clipped, and synthesis of desired images has been made possible. For example, in live coverage of a baseball game, a technique of arbitrarily replacing an advertisement image behind the batter's box and broadcasting the resultant images are a practical use.
Further, because of the progress in communication techniques of recent years, the amount of information transferred via a communication line such as the Internet is increasing. Particularly, the amount of image information is incomparably larger than that of character information. Therefore, in order to reduce the amount of image information transmitted, various image compressing techniques for compressing an image signal, transmitting the compressed image signal, and decompressing the image signal on the reception side have been developed.
For example, as a compression encoding system for a still image, the JPEG (Joint Photographic coding Experts Group) system is adopted as an international standard system. In the JPEG system, the total amount of image information is reduced by thinning out the number of pixels in accordance with a predetermined rule. Also as a compression encoding system for a moving image, for example, the MPEG (Motion Picture coding Experts Group) system is adopted as an international standard system. In the MPEG system, only the parts of an image that are in motion are processed, thereby reducing the total amount of image information.
Incidentally, for recognizing the occurrence of an accident or a crime, it is still necessary to watch a monitor image of a monitor camera by a human being. That is, the occurrence of an accident or the like is not recognized by the monitor camera or the monitor image itself. Therefore, even if a monitor camera is installed, if the person monitoring the camera is not watching the monitor image, the occurrence of an accident or the like will be missed.
Also, although a security sensor such as an infrared sensor can detect intrusion of something, it is difficult to recognize “what” has been detected. Because of this, security sensors often give out false alarms. That is, the security sensor detects not only an intruder but also intrusion of an animal such as a dog.
In the final analysis, the cause of these problems is that “what object is” is not being recognized automatically.
Furthermore, in order to enhance or clip a specific portion of a digital image by image processing, the operator has to designate the specific portion. Also, however a digital image is processed by image processing, the image itself is merely a set of pixel signals. Consequently, “what” the object is in an image is still recognized by a human being in a manner similar to the case of the above-described monitor camera.
Incidentally, as an image recognizing technique, the optical character reader (OCR) has been practically used. Objects for recognition in the OCR are usually characters on a plain white sheet of paper. The OCR automatically recognizes characters by using a pattern matching method of comparing a character pattern clipped from an input image with a reference pattern.
However, in the case of recognizing the image of an object existing in three-dimensional space, the background of the object is not limited to plain white but is often a succession of lines from the outlines of neighboring objects. In this case, it is often difficult to clip an individual object image from the background. Therefore, even by directly applying a conventional pattern matching technique such as the OCR, it is not easy to recognize a three-dimensional object.
Also in conventional image compressing techniques, because processing is intended to compress image signals, the volume of compressed image information transmitted is much larger than that of character information. As a result, there are still problems such that it takes much time to transfer image information and that the burden on the transmission line becomes heavy.
Incidentally, by the existing image recognizing techniques, it is impossible to realize the function of recognizing a three-dimensional object from two-dimensional image information of that three-dimensional object, reading a large amount of the three-dimensional information of the object from the two-dimensional image information, and inferring the three-dimensional object from the read information like a human being. That is, although current two-dimensional image recognizing techniques are fairly advanced, using existing techniques, recognition is only possible to realize to such an extent that the name and kind of the object can be recognized. It is difficult to recognize the object so as to be separated from the other objects and make three-dimensional measurement of a physical quantity of the object and so on like a human being does.
Therefore, if three-dimensional recognition in a real meaning including not only recognition to the extent of name and kind of an object but also recognition of various attributes, three-dimensional shape, and position of three-dimensional coordinates of an object is realized, by combining the recognition with the current computer technology, an artificial intelligence technique of selecting a target object from a plurality of existing objects, recognizing the object, measuring the object, and further, deriving one final conclusion from the positional relation and the meaning relation of the objects like a human being does daily can be realized.