Biological visual systems are complex and sophisticated information-processing systems that obtain raw sensory data from the external environment and extract meaningful information. Emulation of the functions and capabilities of various biological visual systems in electronic sensing and control devices is of great scientific interest and commercial importance. Integrated electronic sensors that emulate or are inspired by biological visual functions are sometimes referred to as "neuromorphic vision chips". These sensors can have applications in a number of fields such as machine vision, automotive guidance, robotics, and remote sensing. In a typical neuromorphic vision chip, a visual stimulus is received and is converted into an electrical signal in a desired format by using one or more photosensors. This electrical signal is then further processed on the same chip. Neuromorphic vision chips thus may be classified as a special type of "smart vision chips" which in general include all integrated circuits performing certain on-chip signal processing of visual data.
In a system interacting with a dynamic environment, a smart vision chip may often be required to respond in "real time", i.e., a short duration in which the scene does not evolve significantly. For example, such a system may need to rely on the extracted information from the received visual data to guide the system behavior. "Real time" for a neuromorphic vision chip usually means the response time of the corresponding biological system that the chip emulates. In many applications in which the typical human visual system is partially emulated, the response time of a neuromorphic vision chip is usually on the order of few tens of milliseconds in order to be considered as "real time".
Typical neuromorphic vision sensor architectures for high response speeds generally implement compact VLSI designs and integrate image-sensing circuitry with the visual computation circuits on a single chip. The visual computation circuits can be arranged in a way so that local processing elements are distributed throughout the chip close to their respective inputs. Such circuit arrangement can be adapted for implementation of fully parallel processing using a small amount of wiring. Parallel architectures can at least partially reduce the requirement of high bandwidth in high speed processing.
One class of smart-vision chips, including neuromorphic vision chips, extracts motion in a visual field. Motion information is an important component of visual information and can be critical to many applications that require tracking moving objects or determining the physical extent of moving objects. In addition, a variety of image-processing tasks, such as segmentation and estimation of depth, may be considerably simplified in dynamic scenes if motion data is available. Motion algorithm and chip architecture are two main factors that affect the performance of a motion sensor.
Analog processing in motion sensors can be used advantageously over digital processing. For example, an analog motion sensor can be made more compact than a digital motion sensor with comparable processing power due to the higher information content of an analog signal. Also, analog circuits can be configured to consume less power than their digital counterparts. See, for example, Mead, "Neuromorphic electronic systems", Proc. IEEE, Vol. 78, pp. 1629-1636 (1990). Small size and low power consumption are desirable for motion-sensing arrays to achieve high densities of motion-processing elements and thus high imaging resolution.
In many applications, analog processing is considered inferior to digital processing because its limited precision. In visual motion sensing, however, digital approaches lose their usual advantage in precision due to inherent high noise in the optical signals and fundamental computational limitations associated with the estimation of the velocity field from the optical flow of a visual scene. See, for example, Verri and Poggio, "Motion field and optical flow: Qualitative properties," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 11(5), pp. 490-498 (1989). The resulting errors and uncertainties may not be reduced by improving the precision of individual processing elements. However, certain priori information, e.g., information on the piecewise rigidity of the environment, can be used to obtain some redundancy. Such redundancy may be used for error correction and noise reduction.
Algorithms for the computation of visual motion may be classified into different categories according to biological evidence or computational implications. Given a host of fundamental difficulties in reliably estimating a velocity field from purely visual data (i.e., optical flow) and the various trade-offs to be considered when designing a sensing circuit, the choice of the algorithm should be made based on the specifications of the system and the operation environment. One classification relevant to the present disclosure distinguishes gradient and correspondence algorithms.
Gradient schemes extract the velocity of an image feature from approximations of temporal and spatial derivatives of the local brightness distribution. Since the calculation of derivatives is sensitive to circuit offsets, noise, and illumination levels, gradient algorithms can be difficult to implement robustly with analog circuitry. An example of gradient implementation is a two-dimensional analog motion sensor by Tanner and Mead. See, Tanner and Mead, "An integrated analog optical motion sensor", VLSI signal Processing, II, S. Y. Kung, ed., pp. 59-76, New York, IEEE press (1986).
Another category, correspondence methods, estimates motion by comparing the positioning of a pattern at different times (i.e., "spatial correspondence"), or by comparing the timing of a pattern at different positions (i.e., "temporal correspondence"). While digital implementations of correspondence algorithms typically use the spatial correspondence, most analog implementations and well-understood biological systems do not intrinsically need to discretize time and thus use the temporal correspondence.
Correspondence methods can be further divided into correlation methods and token-based methods. Correlation methods operate on any type of image structure and hence, like gradient methods, produce a dense map of velocity estimates. However, correlation methods usually exhibit better numerical stability than gradient methods, since correlation is based on multiplication and integration, rather than on differentiation. Token-based methods only respond to a particular class of image features, by first making a decision about their presence at a given location in space and time. At the expense of producing only sparse velocity maps, token-based methods can be made to operate quite robustly.
Many attempts have been made to implement temporal-correspondence algorithms in VLSI circuits, ranging from pure correlation schemes to algorithms performing correlation-type motion computation on extracted image tokens and to token-based time-of-travel methods. Most of these previous analog VLSI motion sensors either only responded robustly to stimuli of very high contrasts or had an output signal that did not encode pure velocity but strongly depended on contrast and/or illumination.
On the other hand, token-based time-of-travel correspondence algorithms have been implemented with compact circuits to unambiguously encode one-dimensional velocity over considerable velocity, contrast, and illumination ranges. A temporal-edge detector responsive to dark-bright edges was used as a feature extractor in the input stage.
Kramer et al. developed a facilitate-and-sample ("FS") circuit which uses an edge signal to produce a sharp voltage spike and a logarithmically-decaying voltage signal at each detector location. See, Kramer et al., "An analog VLSI velocity sensor," Proc. 1995 IEEE Inte'l Symp. Circuits and Systems, pp. 413-416, Seattle, May 1995. The voltage spike from one location was used to sample the analog voltage of the slowly-decaying signal of an adjacent location, which was a measure of the relative time delay of the triggering of the two signals and thus of the edge velocity.
Sarpeshkar et al. implemented a facilitate-and-trigger circuit ("FT") in which an edge signal is used to generate a voltage pulse of fixed amplitude and duration at each edge-detector location. See, Sarpeshkar et al., "Visual motion computation in analog VLSI using pulses," in Advances in Neural Information Processing Systems 5, pp. 781-788, San Mateo, Calif., Morgan Kaufman (1993). The pulses from two adjacent locations were fed into two motion circuits, one for each direction. For motion in a preferred direction such a motion circuit outputs a pulse with a duration equal to the overlap time of the input pulses, while for motion in a null direction opposite to the preferred direction, no output pulse was generated.
The above circuits were demonstrated to be able to measure the velocities of sharp edges of a medium to high contrasts. In particular, the output signal was essentially independent of contrast and the global illumination level over a considerable range. For lower contrasts and more gradual edges, the response started to decrease, thus underestimating the speed. See, Kramer et al., "Pulse-based analog VLSI velocity sensors,", IEEE Trans. Circuits & Systems II: Analog and Digital Signal Processing, Vol. 44(2), pp. 86-101 (1997).
Many of the motion sensors known in the art are subject to a temporal-aliasing criterion and a spatial-aliasing criterion. Since fixed time and space parameters are used for motion computations in these sensors, the speed detection ranges in the temporal and spatial domains may be limited.