1. Field of the Invention
The present invention relates to a method and apparatus for processing visual information which are capable of appropriately processing visual information and which can be adapted to, for example, an input unit, an image encoding and decoding unit, an image recognition unit, an image restoring unit, a monitoring unit, an autonimic vehicle or an autonomic robot.
2. Related Background Art
Living organisms have a function capable of accurately recognizing the surrounding environment to the extent of a necessity by using a finite number of processing units and dealing with the recognized environment. The dynamic range of each signal required to recognize the environment is very wide if all possible situations are assumed. As for visual information for example, visual sensors of a living organism are, as a matter of course, finite. However, the environment extends in all azimuths. Therefore, a living organism having no transferring means must input signals with required resolutions for all azimuths to recognize the surrounding environment. If a living organism has a transferring means, that is, a means for changing the observation parameters for the sensor, the load on the visual recognition system of the living organism can be reduced considerably. The reason for this is that the places considered to be important for recognition are required to be input with sufficiently high resolutions and input is not required in the other cases.
A conventional image input apparatus has been arranged to uniformly sample a subject image, as has been performed by a CCD camera and a scanner. An image input apparatus of the foregoing type can obtain image data of finite regions with a certain resolution. If an image is considered to a portion of visual information, the essential issue in processing visual information is an assumption of three-dimensional visual information from the obtained two-dimensional image. To cope with the foregoing issue, the following two types of approaches have been performed.
Among researches and developments of the visual system of living organisms performed energetically in the nineteen-eighties, a major portion of investigations using mathematical models can be said that they have been originated from the ideal of Marr (D. Marr: xe2x80x9cVisionxe2x80x9d W. H. Freeman and Co. N.Y. (1982)). The foregoing researches have been called xe2x80x9cComputational Visionxe2x80x9d followed by the researches being developed by means of ideas of statistic physics, such as the Regularization Theory, Markov Random Field, Line Process and application of a renormalization group. However, in the foregoing discussion, a finite number of image data items, which have been given previously, are made to be the subject as visual information in such a manner that the three-dimensional structure is estimated from two-dimensional image sets. The foregoing method corresponds to an estimation of a three-dimensional world by looking, for example, a photograph or a picture. A problem estimating the three-dimensional structure from only given information is ill-posed because the solution is intermediate. Accordingly, they have coped with the problem by using knowledge.
On the other hand, a methodology has been suggested at the same time in which the vision input system is controlled to prepare information sufficient for recognition and then the environment is recognized, that is, Animate Vision disclosed by Ballard (D. H. Ballard: xe2x80x9cBehavioural constraints on animate visionxe2x80x9d, image and vision computing, Vol. 7, No. 1, pp.3-9 (1989)). The foregoing methodology is intended to overcome the ill-posed characteristic existing in the visual information input first by means of input data obtained by using another observation parameter. For the observation parameters, the direction of the optical axis of an optical system and zooming can be employed. The most important fact is to determine xe2x80x9cthe subject to be searched nextxe2x80x9d and xe2x80x9ca place to be observed nextxe2x80x9d, that is, a method of controlling the observation parameter.
1. Method Disclosed by Ballard et al. (D. H. Ballard and C. M. Brown: xe2x80x9cPrinciples of Animate Visionxe2x80x9d, GVGIP: IMAGE UNDERSTANDING, Vol. 156, No.1, pp.3-21 (Aug. 1992).
The vision environment recognition system comprising an image input apparatus includes two types of image input methods consisting of a foveal vision for sampling a small region adjacent to the optical axis with a high resolution and a peripheral vision for sampling a large region apart from the optical axis with a low resolution. Thus, recognition of an object can be performed without exception if it can be captured in foveal vision. Knowledge data is expressed by a tree structure, such as an IS-A tree or a part-of tree, and a probability structure is introduced into the relationship between objects. A strategy has been employed in which a utility function is defined between the quantity of information obtained after a certain operation has been completed and energy consumed to perform the operation in accordance with the foregoing tree structure and the probability structure; and the utility function is used to determine a next operation.
2. The system disclosed by Ballard et al. has employed a method of directly searching an object to be searched next. Wixson et al. has suggested an indirect searching method as an observation point control method for searching an object which is the subject (L E. Wixon and D H. Ballard: xe2x80x9cUsing intermediate objects to improve the efficiency of visual searchxe2x80x9d, Int""l. J. Computer Vision, 12:2/3, pp.209-230 (1994). The indirect searching method performs a search in accordance with the spatial position relationship between an object identified by an observation and an intended object. Assuming that the intended object is a coffee cup and identified objects are a desk, a chair and a blackboard, the input system is controlled in such a manner that the position, at which the desk having the most significant spatial position relationship with the coffee cup exists, is further observed with a high resolution,
A system disclosed by Brooks et al. (R A. Brooks: xe2x80x9cNew Approaches to Roboticsxe2x80x9d, Science, Vol.25, pp.1227-1232 (1991)) comprises at least two basic processing programs establishing the connection between sensor inputs and actuator outputs. Tani et al. has suggested a system having a structure such that rules existing in time sequence signal vectors of sensor inputs are as acquired by learning and the rules are used in behavior schedule (see Japanese Patent Laid-Open No. 6-274224). According to the foregoing method, a system adaptable to an unknown environment can be constituted. Moreover, a mechanism has been provided in which even if a plurality of possible actions exist, one of the actions is selected.
In addition to the foregoing conventional and representative theories, the following suggestions have been performed:
R. Rimey and C. M. Brown: xe2x80x9cTask-Oriented Vision with Multiple Bayes Netsxe2x80x9d, in xe2x80x9cActive Visionxe2x80x9d, A. Blake and A. Yuille (Eds.) MIT press (1992),
S. Geman and D. Geman: xe2x80x9cStochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Imagexe2x80x9d IEEE Trans. on Pattern Anal. Machine Intell., Vol. 6, No. 6, pp.721-741 (Nov. 1984),
B. Gidas: xe2x80x9cA Renormalization Group Approach to Image Processing Problemsxe2x80x9d, IEEE Trans. on Pattern Anal. Machine Intell., Vol. 11, No. 2, pp.164-180 (Feb. 1989),
Kawato and Inui: xe2x80x9cComputional Theory of the Visual Cortical Areasxe2x80x9d, IEICE Trans., Vol. J73-D-II, No. 8, pp. 1111-1121 (Aug. 1990),
D. V. Lindley: xe2x80x9cOn a measure of the information provided by an exeprimentxe2x80x9d, Ann. Math. Stat., vol. 27, pp.986-1005 (1956),
K. J. Bradshaw, P. F. McLauchlan, I. D. Reid and D. W. Murray: Saccade and pursuit on an active head/eye platformxe2x80x9d, Image and Vision Computing, Vol. 12, no. 3, pp.155-163 (Apr. 1994), and
J. G. Lee and H. Chung: xe2x80x9cGlobal path planning for mobile robot with grid-type world modelxe2x80x9d, Robotics and Computer-Integrated Manufacturing. Vol. 11, no.1, pp.13-21 (1994).
However, since a major portion of the foregoing computational theories has discussed about information obtainable from given (sets of) images, the obtained results are only estimated values. Since the world has been described by using the observer-oriented coordinate. Systems, treatment of movable objects is too complex.
On the other hand, since the Animate Vision uses an object-oriented coordinate system to describe the world, the treatment of movable objects can relatively be simplified. However, the observation point, control, which is the most important control, encounters some problems, that is:
1. A method of recognizing a minimum unit of an object constituting knowledge has not been discussed. That is, the discussion has been performed on the assumption that the recognition of the minimum unit is easy.
2. The description has been performed that the knowledge is described by a knowledge engineer. That is, knowledge of environments that is not known by human beings cannot be given.
The system disclosed in, for example, Japanese Patent Laid-Open No. 6-274224, is a system in which knowledge is acquired by learning. However, since input/output data and the structures of the neural network are general structures, hierarchical structure cannot always be acquired. Moreover, even if the neural network has the performance for acquiring the hierarchical structure, it can be expected that an excessively long time is required.
Accordingly, an object of the present invention is to provide an image information processing method and apparatus capable of quickly acquiring image information.
Another object of the present invention is to provide a variety of systems to each of which the image information processing method and apparatus are effectively applied.
According to one aspect, the present invention which achieves these objectives relates to a method of controlling an image information processing apparatus, comprising the steps of: optically receiving an image from an image input unit of the image information processing apparatus; detecting a feature from the received image; calculating quantity of visual information in accordance with the position of the detected feature; and controlling the image input portion in such a manner that the quantity of visual information is enlarged.
According to another aspect, the present invention which achieves these objectives relates to an image information processing method comprising the steps of: monitoring a supplied image; calculating an evaluation value of each feature in the supplied image; detecting a feature, the evaluation value of which is higher than a predetermined value; moving a direction of an optical axis to the detected feature; acquiring data of image near the detected feature; allotting an identifier to the acquired image data and storing a set formed by the position of the detected feature, data of the image near the feature, time of detection and the allotted identifier.
According to yet another aspect, the present invention which achieves these objectives relates to an image information processing apparatus comprising: image input means for optically inputting an image; detection means for detecting a feature from the image supplied from the image input means; calculating means for calculating quantity of visual information in accordance with the position of the feature detected by the detection means; and control means for controlling the image input means in such a manner that the quantity of visual information calculated by the calculating means is enlarged.
According to still another aspect, the present invention which achieves these objectives relates to an image information processing apparatus comprising: monitoring means for monitoring a supplied image; calculating means for calculating an evaluation value of each feature in the supplied image, which is being monitored by the monitoring means; detection means for detecting a feature, the evaluation value of which is higher than a predetermined value; moving means for moving a direction of an optical axis to the detected feature; acquiring means for acquiring data of image near the feature detected by the detection means; and storage means which allots an identifier to the acquired image data so as to store a set formed by the position of the detected feature, data of the image near the feature, time of detection and the allotted identifier.
According to another aspect, the present invention which achieves these objectives relates to an image information processing apparatus comprising: image input means controlled with an input parameter to input an image; mapping. means which causes input image to be discrete to map the image to a multi-resolution space; feature detection means for detecting a feature from the input image; transform encoding means for transforming the mapped image into a local pattern about the detected feature; quantizing means for quantizing the transformed local pattern; knowledge acquiring means for obtaining time and spatial correlation between data items quantized by the quantizing means; and input parameter control means for modifying the input parameter in accordance with quantized data and the correlation.
Other objectives and advantages besides those discussed above shall be apparent to those skilled in the art from the description of a preferred embodiment of the invention which follows. In the description, reference is made to accompanying drawings, which form a part thereof, and which illustrate an example of the invention. Such example, however, is not exhaustive of the various embodiments of the invention, and therefore reference is made to the claims which follow the description for determining the scope of the invention.