The output of a DVS is an event-based change in camera-sensing luminance. Generally, the output of DVS is a stream of events in which each event is associated with a particular state, i.e., the event location within an image sensor array and a binary state indicating a positive or a negative change in luminance. A certain number of DVS events are sampled to form an image in which pixel locations containing one or more events are set to be non-zero and other pixel locations are all set to be zero. The value of each non-zero pixel may be determined by different techniques. For example, each non-zero pixel may be represented by a vector u that may include a timestamp, the pixel coordinates and the latest event state change, i.e., +1 for a positive change in luminance or −1 for a negative change in luminance. Alternatively, a non-zero pixel may be represented by the number of events appearing at that location, or represented by the arrival time of the latest event.
A conventional DVS is an asynchronized sensor without time integration, so DVS frames must be formed based on a certain time of sampling, or frame-integration time, so that changes between temporally adjacent frames may be compared to estimate camera movement. The major difficulties associated with DVS camera movement or tracking include: (1) the features within each DVS frame may be sparse and highly variant so feature-based image matching may be difficult (if even possible) causing movement estimation accuracy to be vulnerable; and (2) due to the lack of extraction of key features, corresponding landmarks are not available through DVS movement. Accordingly, it may be difficult to cross check a current estimation of the camera movement or pose and it may be difficult to refer landmarks to reduce sensor-movement estimation drift.