The evolution of robots has produced machines of increased complexity and of greater ability. Compared to the first commercial models, which were simple blind servants that could be taught to execute pre-programmed movements, the mobile robots of today are built to interact with the world. The first visual systems which were passive in nature, accepting any data that randomly fell into the field of view, have been replaced by active ones, able to select objects of interest and view them in an intelligent manner. When designers gave robots the means to move about, they brought to bear issues that previously were irrelevant. One such issue is the co-ordination of an artificial vision system mounted on a mobile platform. At present, solutions to this problem have known drawbacks.
From a somewhat philosophical perspective, the current position of robot designers can be compared to that of Mother Nature's in the early stages of biological system design, or evolution. For during the course of designing a moving, seeing robot the designer is faced with the same issues that Mother Nature had to address long ago. On the other hand, the designer is in a favourable position here because he can sneak a peek at existing design strategies (namely nature's) to help guide his own work. It can certainly be argued that designs in nature are not necessarily optimal, but they certainly are robust and serve as a good starting point.
For example, the control of computational vision systems is generally solved using a traditional engineering mindset. Even though knowledge of anatomy and physiology relevant to biological binocular control is now extensive, designers do not use existing natural systems to help guide their work. An artificial vision system is often organized into three modules: (i) an imaging system to acquire images, (ii) an image processing stage, and (iii) a controller. Generally, imaging system design, image processing and control are independent aspects of a vision system and can be designed independently, though the resulting system is highly dependent on each aspect.
Many known vision systems are binocular--for example, those disclosed by Ballard, D. H. in "Animate Vision," Artificial Intelligence, Vol. 48, pp. 57-86, 1991; by Brown, C., in "Gaze Controls Cooperating Through Prediction," Image and Vision Computing, Vol. 8, No. 1, pp. 10-17, 1990; and by Krotkov, E. P. in Active Computer Vision by Cooperative Focus and Stereo, Springer-Verlag, New York, 1989. ISBN 0-387-97109-3. Two common reasons for this choice are that primate vision systems work well and that knowledge about mammalian oculomotor control provides valuable insight into how a vision system might work. Nature, for instance, uses one controller for both eyes in a pair of eyes. Of course, another advantage to modelling primate vision systems is their adaptability. For example, primate vision systems are not known to enter unstable modes of operation. Also, primate vision systems adapt extremely well to a loss of an eye or to a single available eye. Primate vision systems have a fovea--a specialised region of the retina giving such high visual acuity that it is used preferentially for seeing. Foveated animals generally possess image-stabilising reflexes, but also have other reflexes acting to bring selected points of interest to the fovea and then hold them there.
With regard to artificial vision systems, Rotstein and Rivlin (Rotstein H. P., and E. Rivlin, "Optimal Servoing for Acive Foveated Vision," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, Calif., U.S.A., pp. 177-182, 1996) demonstrated that the need for foveated vision sensors may be established based on control considerations, and that the optimal fovea size may actually be computed. The existence of a fovea presents an inherent trade off: a small foveal window requires little image processing time but places tight demands on tracking performance; alternatively tracking objectives may be relaxed by increasing the foveal zone, but at the expense of slower dynamics due to longer computational delays. Thus, the optimal fovea size is formulated as a maximisation problem involving the plant, computational delays and other hardware restrictions, and the expected bounds of target motion.
Artificial vision systems to date are generally afoveate as a result of commercial imaging technology. However, for the sake of generality and of future development, research into computational vision systems has been conducted with the notion that someday vision systems will typically be foveated. There has already been work in the development of foveated vision sensors and application of spatially varying sensors in vision systems.
Controlling gaze is the operational purpose of some vision systems. Gaze is formally defined as the line of sight measured relative to the world co-ordinate system. The act of gazing at a single point in three-dimensional space is called fixating and the location at which the eyes are directed, the target, is termed the fixation point. The process of gaze control consists of orienting gaze in order to achieve a desired goal.
Within robotics circles, the problem of gaze control is functionally broken down into two problems: gaze holding and gaze shifting. Gaze holding refers to the facility to track a possibly moving target with a viewing system that may also be moving. It also includes compensating for external perturbations to the vision system (due to egomotion, for example) and implies smooth tracking movements. Gaze shifting refers to the rapid redirection of gaze for the purpose of shifting attention to a possibly new target in the visual field. To a physiologist this nomenclature is awkward. Gaze holding implies that the gaze of a vision system is held constant, which is consistent with the definition of gaze. In order to part with this confusing terminology "gaze holding" is referred to herein and in the claims which follow as target stabilisation and "gaze shifting" is referred to herein and in the claims which follow as target acquisition, which is more descriptive of the occurring action.
In the case of a frontal, binocular vision system target stabilisation requires that the line of sight of each eye or imaging device be directed at the same point of interest. Regardless of the mobility of a robot, a vision system that maintains its gaze on a target benefits from improved visual interpretation of the world and, consequently, can interact better therewith.
There are three tasks involved in target stabilisation. A first is locating a fixation point; a second task consists of extracting fixation errors for each imaging device; and the third task is concerned with a control strategy used to servo the gaze successfully.
Vergence involves co-ordination of two eyes under near-target conditions, where proper viewing requires the crossing of the visual axes. Both stereopsis, an ability to visually perceive the third dimension, which depends on each eye receiving a slightly different image of the same object, and bifoveal fixation of a single target require precise alignment of the visual axes. This task is the responsibility of the vergence system.
Generally the first task involves image analysis, which provides a target location, the second task involves extracting retinal errors or other data for provision to a controller for performing the third task. The third task for controlling gaze is often performed by analysing each part of gaze motion separately for each imaging device and then summing appropriate control signals for provision to a plant for controlling imaging device motion.
Target-stabilisation systems should be able to follow a moving target without necessarily recognising it first. Consequently, active vision systems essentially work on the principle that the only knowledge of the target is that the "eyes" are initially pointed at it. The target stabilization problem then is one of maintaining fixation of the moving target from a moving viewing system.
Target stabilisation is known to be advantageous. For example in Coombs, D. J., and C. M. Brown, "Cooperative Gaze Holding in Binocular Vision," IEEE Control Systems Magazine, Vol. 11, No. 4, pp.24-33, 1991 and hereby incorporated by reference, the usefulness of target stabilisation is summarised. For example, image stabilisation allows reduced blur when imaging a moving target. Fixating an object of interest brings it near the optical axis of each eye and minimises geometric distortions. Tracking of a target improves operation of many stereo vision algorithms. Also, stabilising on a moving object in a stationary scene causes the object to "pop-out" as a result of motion blur related to the un-stabilised parts of the scene. There are other advantages and applications of image stabilisation.
Biological gaze control strategies are based upon operating modalities rather than tasks. In fact, only two modalities are known to be used: the fast-phase and slow-phase modalities. These are distinguished by their tactics and operating frequency range.
The slow-phase modality is also known as `slow control` and `smooth pursuit` and is responsible only for target stabilisation. It produces `smooth eye movements` or slow phases--so called because of the low operating bandwidth. Smooth eye movements are largely regarded as sensorimotor reflexes. The following review articles overview slow-phase system response and understanding in biology: Kowler, E., "The Role of Visual and Cognitive Processes in the Control of Eye Movement," in Eye Movements and Their Role in Visual and Cognitive Processes, Chapter 1, E. Kowler (Ed.), Elsevier Science Publishers BV (Biomedical Division), Amsterdam, pp. 1-70, 1990; Lisberger, S. G., E. J. Morris, and L. Tychsen, "Visual Motion Processing and Sensory-Motor Integration for Smooth Pursuit Eye Movements," Annual Review of Neuroscience, Vol. 10, pp. 97-129, 1987; and Robinson, D. A., "Control of Eye Movements," in Handbook of Physiology Section 1: The Nervous System Vol. II Motor Control, Part 2, V. B. Brooks Ed., American Physiological Society, Bethesda, Md., pp. 1275-1320, 1981.
The fast-phase modality produces rapid eye movements. The operating bandwidth is much wider than that of the slow-phase modality and eye movement dynamics are faster than that of the eye plant. This modality corresponds to the target acquisition task in gaze control. The fast-phase system is influenced by cognitive factors, even more so than the slow-phase system. A good overview of the fast phase modality in biological systems is found in the following sources: Leigh, R. J., and D. S. Zee, The Neurology of Eye Movements, 2nd ed., F. A. Davis Co., Philadelphia, 1991. ISBN 0-8036-5528-2; Lisberger, S. G., E. J. Morris, and L. Tychsen, "Visual Motion Processing and Sensory-Motor Integration for Smooth Pursuit Eye Movements," Annual Review of Neuroscience, Vol. 10, pp. 97-129, 1987; and Robinson, D. A., "Control of Eye Movements," in Handbook of Physiology Section 1. The Nervous System Vol. II Motor Control, Part 2, V. B. Brooks Ed., American Physiological Society, Bethesda, Md., pp. 1275-1320, 1981.
Historically, two categories of rapid eye movements were thought to exist. They were classified as saccades and quick phases, depending on the context under which the movements were evoked. Eventually, it was discovered that saccades and quick phases were, in fact, produced by the same neural circuitry. In accordance with this, rapid eye movements are referred to herein and in the claims that follow as "fast phases".
In a foveated vision system, the need for two operating modalities comes as a result of the conflicting goals when following a moving target as described in Coombs, D., and C. Brown, "Real-Time Binocular Smooth Pursuit," International Journal of Computer Vision, Vol. 11, No. 2, pp. 147-164, 1993. It is generally believed that the target pursuit system of primates does not favour either the position or velocity control goals. The slow-phase modality is used to minimise slip and, when the target image deviates too far from the fovea, the fast-phase modality is invoked to quickly reacquire a target--be it a new one or the same one that is falling out of view. Modality switching is a clever non-linear solution to the dilemma of how to minimise both velocity and position error simultaneously, since smooth eye movements alone cannot achieve both goals and saccadic movements cannot reduce motion blur since they don't match velocity. It is more accurate that position and velocity errors each contribute to both slow and fast phases; only their relative importance is variable.
As briefly mentioned above, eye movements may be elicited either by visual or non-visual stimuli. Benefits of a dual-modality control strategy are observed in both instances.
A good overview of biological occumotor control is presented in Galiana, H. L., and J. S. Outerbridge, "A Bilateral Model for Central Neural Pathways in Vestibuloocular Reflex," Journal of Neurophysiology, Vol. 51, No. 2, pp. 210-241, 1984, which is hereby incorporated by reference.
The mammalian eye is essentially a globe held in a socket allowing three degrees of freedom per eye: rotations in the vertical and horizontal planes and rotations about the line of sight. A pair of eyes, is capable of only three types of motion: conjugate, vergence and torsional movements. Inputs to the brain are transformed by sensors that respond to a specific visual or vestibular stimulation. In total there are three types of stimuli to process. Significantly, regardless of the nature of the excitation, similar eye movements arise.