Apparatuses and methods consistent with the present invention relate to the computer vision field and in particular to a method and device for processing Dynamic Vision Sensor (DVS) events.
Recently, Dynamic Vision Sensor (DVS) cameras are used in the fields such as computer vision, artificial intelligence and robots. Compared with a conventional Complementary Metal-Oxide-Semiconductor (CMOS) sensor or a Charge Coupled Device (CCD) sensor, a DVS camera has the following three features so that the DVS camera is more applicable to applications with high real-time requirements.
1) Event asynchrony. Unlike the CMOS/CCD sensor, the DVS camera has an asynchronous imaging process, and each pixel can autonomously and individually generate events according to the change in illumination intensity of a scene. Therefore, when compared image data from the CMOS/CCD sensor where image data from all pixels are processed, in a DVS camera, image data from a smaller number of pixels can be quickly processed. As a result, the response speed of the DVS camera to the scene change is far quicker than that of the CMOS/CCD sensor, so that the DVS camera fundamentally offers a possibility of proposing and realizing a super real-time vision algorithm.
2) Event sparsity. Unlike the CMOS/CCD sensor, the DVS camera is a motion sensitive sensor, and captures only boundary or outline events of an object which has a relative motion and a change in illumination reaching a certain threshold. Therefore, the scene content can be described by only few events. The content to be processed by the DVS camera is greatly reduced when in comparison with the CMOS/CCD sensor so that the computation overhead can be saved to a large extent and the computation efficiency can be improved.
3) Illumination robustness. The events generated by the DVS are related to the change in illumination intensity of a scene. When the illumination change in a scene is greater than a given threshold, the DVS correspondingly generates a corresponding event for describing the change in scene content. Therefore, the DVS camera is a sensor which is robust in illumination change. The DVS camera will not result in scene texture attenuation or mirror effect due to the increase of the illumination intensity, so that the influences from the illumination, texture or other factors are reduced to a large extent.
Although the DVS camera has the features of low delay, low power consumption, high dynamic range or more, there is still a great difference in the imaging principle and image generation process between the DVS camera and the conventional optical camera. The event asynchrony and sparsity of the DVS camera will inevitably result in inconsistent event distribution and inconsistent number of contained events in different DVS event maps, wherein each image corresponds to a DVS event map and each DVS event map contains several events. Due to the characteristics of the inconsistent event distribution and the inconsistent number of events, the corresponding DVS event map sequence does not have the temporal consistency which is common to optical camera sequences, wherein the temporal consistency means that the event distribution and the number of events in event maps corresponding to two adjacent images frame in a DVS event map sequence are consistent.
If a DVS event map event without temporal consistency is used for subsequent pose recognition processing or three-dimensional reestablishment processing, the result of processing may not be accurate and robust. Therefore, the generation of a DVS event map sequence with temporal consistency becomes a challenge to be solved.