Conventional cameras generally observe a scene as a sequence of images acquired at a fixed frame rate. This image-based representation is an inefficient and redundant encoding of the scene that wastes power, memory, and time. Because the exposure time for each image is the same, this can also lead to difficulties in imaging scenes that have a high dynamic range, i.e., scenes containing very dark and very bright regions.
Dynamic vision sensors (DVS) sensors are different from conventional cameras. DVS do not output actual pixel intensity values. Instead, DVS signal pixel-level changes, i.e., whether the intensity of a pixel increases or decreases. DVS are inspired by a model of the human retina, see Delbruck et al., “A silicon early visual system as a model animal,” Vision Research, 2004. These sensors do not send entire images at fixed frame rates. Instead, DVS only send pixel-level changes as 0 and 1 binary events, similar to signals from retinal ganglion cells, at exactly the time the changes occur. For example, a 0 indicates a decrease in intensity and a 1 indicates an increase in intensity of a particular pixel. As a result, the events are transmitted at microsecond time resolution, equivalent to or better than conventional high-speed camera running at thousands of frames per second. That is, instead of wastefully sending entire images at fixed frame rates, in applications of concern to the invention, the DVS only transmits the local pixel-level changes caused by movement of the sensor at the time the events occur.
In U.S. 20150160737, “Apparatus and method for recognizing gesture using sensor,” 2015, a dynamic vision sensor (DVS) is used for gesture recognition. In U.S. 20140354537, “Apparatus and method for processing user input using motion of object,” 2014, a DVS is used for tracking moving objects. Conventional procedures, such as optical flow have been developed for DVS, see EP 2795580, “Method of estimating optical flow on the basis of an asynchronous light sensor,” 2014.
Some methods extend visual odometry to DVS using a conventional camera, see Censi et al., “Low-latency event-based visual odometry,” ICRA, 2014. Censi et al., disclose a visual odometry procedure used with a DVS and a complementary metal-oxide semiconductor (CMOS) camera. A novel calibration procedure performs spatio-temporal calibration of the DVS and the CMOS camera. Their approach estimates the relative motion of the DVS events with respect to a previous CMOS frame. The method is accurate for rotation estimates, whereas translation measurements tend to be noisy.
SLAM techniques and 3D reconstruction methods have also been developed for DVS, see Weikersdorfer et al., “Simultaneous localization and mapping for event-based vision systems,” Computer Vision Systems, 2013, and Carneiro et al., “Event-based 3D reconstruction from neuromorphic retinas,” Neural Networks, 2013. The Lucas-Kanade tracking procedure for optical flow estimation has been extended for the DVS. Certain registration methods, such as iterative-closest-point (ICP), have been extended for DVS to control micro-grippers. DVS can also be useful for evasive maneuvering of quadrotors. A pair of DVS can be used as a stereo camera for reconstructing objects near a quadrotor for predicting and avoiding collisions.
In the case of asynchronous DVS, the concept of epipolar geometry cannot be directly used. Epipolar geometry provides a relationship between corresponding points in a pair of images acquired from different viewpoints. Recently, a general version of epipolar geometry applicable for asynchronous event-based stereo-configuration has been developed, see Benosman et al., “Asynchronous event-based Hebbian epipolar geometry,” IEEE Trans. Neural Network, 2011.
The stereo-matching can be done using event histograms and estimated depths can be used for gesture recognition. Beyond stereo, some methods use more than two event sensors for 3D reconstruction.
DVS-based solutions are also known for particle-filter based localization and mapping. It is possible to build a mosaic of a scene solely using a DVS without any additional sensors. This is achieved by tracking the sensor motion and using the estimated motion for registering and integrating the data.
However, as a drawback, the cost of a typical DVS is in the range of several thousand dollars.