1. Field of the Invention
The present invention relates to a method and apparatus for processing visual information which are capable of appropriately processing visual information and which can be adapted to, for example, an input unit, an image encoding and decoding unit, an image recognition unit, an image restoring unit, a monitoring unit, an autonimic vehicle or an autonomic robot.
2. Related Background Art
Living organisms have a function capable of accurately recognizing the surrounding environment to the extent of a necessity by using a finite number of processing units and dealing with the recognized environment. The dynamic range of each signal required to recognize the environment is very wide if all possible situations are assumed. As for visual information for example, visual sensors of a living organism are, as a matter of course, finite. However, the environment extends in all azimuths. Therefore, a living organism having no transferring means must input signals with required resolutions for all azimuths to recognize the surrounding environment. If a living organism has a transferring means, that is, a means for changing the observation parameters for the sensor, the load on the visual recognition system of the living organism can be reduced considerably. The reason for this is that the places considered to be important for recognition are required to be input with sufficiently high resolutions and input is not required in the other cases.
A conventional image input apparatus has been arranged to uniformly sample a subject image, as has been performed by a CCD camera and a scanner. An image input apparatus of the foregoing type can obtain image data of finite regions with a certain resolution. If an image is considered to a portion of visual information, the essential issue in processing visual information is an assumption of three-dimensional visual information from the obtained two-dimensional image. To cope with the foregoing issue, the following two types of approaches have been performed.
Among researches and developments of the visual system of living organisms performed energetically in the nineteen-eighties, a major portion of investigations using mathematical models can be said that they have been originated from the ideal of Marr (D. Marr: "Vision" W. H. Freeman and Co. NY (1982)). The foregoing researches have been called "Computational Vision" followed by the researches being developed by means of ideas of statistic physics, such as the Regularization Theory, Markov Random Field, Line Process and application of a renormalization group. However, in the foregoing discussion, a finite number of image data items, which have been given previously, are made to be the subject as visual information in such a manner that the three-dimensional structure is estimated from two-dimensional image sets. The foregoing method corresponds to an estimation of a three-dimensional world by looking, for example, a photograph or a picture. A problem estimating the three-dimensional structure from only given information is ill-posed because the solution is intermediate. Accordingly, they have coped with the problem by using knowledge.
On the other hand, a methodology has been suggested at the same time in which the vision input system is controlled to prepare information sufficient for recognition and then the environment is recognized, that is, Animate Vision disclosed by Ballard (D. H. Ballard: "Behavioural constraints on animate vision", image and vision computing, Vol. 7, No. 1, pp.3-9 (1989)). The foregoing methodology is intended to overcome the ill-posed characteristic existing in the visual information input first by means of input data obtained by using another observation parameter. For the observation parameters, the direction of the optical axis of an optical system and zooming can be employed. The most important fact is to determine "the subject to be searched next" and "a place to be observed next", that is, a method of controlling the observation parameter.
1. Method Disclosed by Ballard et al. (D. H. Ballard and C. M. Brown: "Principles of Animate Vision", GVGIP: IMAGE UNDERSTANDING, Vol. 156, No.1, pp.3-21 (August 1992).
The vision environment recognition system comprising an image input apparatus includes two types of image input methods consisting of a foveal vision for sampling a small region adjacent to the optical axis with a high resolution and a peripheral vision for sampling a large region apart from the optical axis with a low resolution. Thus, recognition of an object can be performed without exception if it can be captured in foveal vision. Knowledge data is expressed by a tree structure, such as an IS-A tree or a part-of tree, and a probability structure is introduced into the relationship between objects. A strategy has been employed in which a utility function is defined between the quantity of information obtained after a certain operation has been completed and energy consumed to perform the operation in accordance with the foregoing tree structure and the probability structure; and the utility function is used to determine a next operation.
2. The system disclosed by Ballard et al. has employed a method of directly searching an object to be searched next. Wixson et al. has suggested an indirect searching method as an observation point control method for searching an object which is the subject (L E. Wixon and D H. Ballard: "Using intermediate objects to improve the efficiency of visual search", Int'l. J. Computer Vision, 12:2/3, pp.209-230 (1994). The indirect searching method performs a search in accordance with the spatial position relationship between an object identified by an observation and an intended object. Assuming that the intended object is a coffee cup and identified objects are a desk, a chair and a blackboard, the input system is controlled in such a manner that the position, at which the desk having the most significant spatial position relationship with the coffee cup exists, is further observed with a high resolution.
A system disclosed by Brooks et al. (R A. Brooks: "New Approaches to Robotics", Science, Vol.25, pp.1227-1232 (1991)) comprises at least two basic processing programs establishing the connection between sensor inputs and actuator outputs. Tani et al. has suggested a system having a structure such that rules existing in time sequence signal vectors of sensor inputs are as acquired by learning and the rules are used in behavior schedule (see Japanese Patent Laid-Open No. 6-274224). According to the foregoing method, a system adaptable to an unknown environment can be constituted. Moreover, a mechanism has been provided in which even if a plurality of possible actions exist, one of the actions is selected.
In addition to the foregoing conventional and representative theories, the following suggestions have been performed:
R. Rimey and C. M. Brown: "Task-Oriented Vision with Multiple Bayes Nets", in "Active Vision", A. Blake and A. Yuille (Eds.) MIT press (1992),
S. Geman and D. Geman: "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Image" IEEE Trans. on Pattern Anal. Machine Intell., Vol. 6, No. 6, pp.721-741 (November 1984),
B. Gidas: "A Renormalization Group Approach to Image Processing Problems", IEEE Trans. on Pattern Anal. Machine Intell., Vol. 11, No. 2, pp.164-180 (February 1989),
Kawato and Inui: "Computational Theory of the Visual Cortical Areas", IEICE Trans., Vol. J73-D-II, No. 8, pp. 1111-1121 (August 1990),
D. V. Lindley: "On a measure of the information provided by an exepriment", Ann. Math. Stat., vol. 27, pp.986-1005 (1956),
K. J. Bradshaw, P. F. McLauchlan, I. D. Reid and D. W. Murray: Saccade and pursuit on an active head/eye platform", Image and Vision Computing, Vol. 12, no. 3, pp.155-163 (April 1994), and
J. G. Lee and H. Chung: "Global path planning for mobile robot with grid-type world model", Robotics and Computer-Integrated Manufacturing, Vol. 11, no.1, pp.13-21 (1994).
However, since a major portion of the foregoing computational theories has discussed about information obtainable from given (sets of) images, the obtained results are only estimated values. Since the world has been described by using the observer-oriented coordinate systems, treatment of movable objects is too complex.
On the other hand, since the Animate Vision uses an object-oriented coordinate system to describe the world, the treatment of movable objects can relatively be simplified. However, the observation point control, which is the most important control, encounters some problems, that is:
1. A method of recognizing a minimum unit of an object constituting knowledge has not been discussed. That is, the discussion has been performed on the assumption that the recognition of the minimum unit is easy.
2. The description has been performed that the knowledge is described by a knowledge engineer. That is, knowledge of environments that is not known by human beings cannot be given.
The system disclosed in, for example, Japanese Patent Laid-Open No. 6-274224, is a system in which knowledge is acquired by learning. However, since input/output data and the structures of the neural network are general structures, hierarchical structure cannot always be acquired. Moreover, even if the neural network has the performance for acquiring the hierarchical structure, it can be expected that an excessively long time is required.