A Dynamic Vision Sensor (DVS) is a novel Complementary Metal Oxide Semiconductor (CMOS) image sensor. Different from images generated by a conventional CMOS or Charged-coupled Device (CCD) sensor, the DVS can generate events according to the change in illumination intensity of a scene. The DVS generates a DVS image by using the change in contrast of pixel points which exceeds a preset threshold due to the change in illumination intensity, as event pixel events, and the change in contrast of pixel points which does not exceed the preset threshold, as non-event pixel points, so as to generate a DVS image.
Image vision processing methods based on a dual-camera DVS image are widely applied in the fields such as object recognition, scene 3D modeling, image rendering, stereoscopic television and aided driving.
In the existing image vision processing methods, it is generally required to acquire a dual-camera DVS frame image (i.e., a dual-camera frame image). The existing image vision processing method comprises the operations of: photographing and generating a left-camera frame image by a left-camera DVS camera, and photographing and generating a right-camera frame image by a right-camera DVS camera; and, determining the parallax between pixel points in the left-camera frame image and matched pixel points in the right-camera frame image in the dual-camera frame image, and determining depth information of the matched pixel points according to the determined parallax. Wherein, the parallax between pixel points in the left-camera frame image and the matched pixel points in the right-camera frame image is determined mainly by a frame image matching technology based on local feature similarity, non-local feature similarity or global feature similarity.
However, the DVS generates a small amount of (i.e., sparse) event pixel points and the event pixel points generated by the left and right DVS cameras are inconsistent in distribution and amount, or more. Therefore, pixel points within most regions of the left-camera frame image and right-camera frame image are non-event pixel points.
On one hand, since the non-event pixel points have a small change in contrast, and there is a little difference in contrast between the non-event pixel points particularly in a scene with a high illumination intensity (e.g., backlight) or a low illumination intensity (e.g., at night or in a dark room), it is difficult to distinguish between the non-event pixel points. Therefore, in the existing image vision processing method, when performing matching between non-event pixel points or between event pixel points and non-event pixel points in the left-camera and right-camera frame images, it is very likely to result in mismatching. On the other hand, when there is a repetitive texture structure (e.g., checkerboard texture) in a frame image, due to the repetition of the texture, a non-event pixel point in a camera frame image have a plurality of matchable pixel points in the other camera frame image, so that it is very likely to result in mismatching. Undoubtedly, the depth information determined according to the mismatched non-event pixel points is wrong, and the non-event pixel points are very likely to become noise points. As a result, the accuracy of the depth information of pixel points in the whole frame image is reduced greatly. Consequently, subsequent processing operations based on the depth information of pixel points in the frame image are adversely impacted, or even the subsequent processing operations based on the depth information of the pixel points fails.
In addition, in the existing image vision processing methods, the parallax and depth information of the matched pixel points can be calculated only after the pixel points in the dual-camera frame images are matched. However, due to the occlusion by different objects to be shot in some scenes (e.g., close shooting or macro shooting), the dual-camera frame images are not completely consistent. That is, some non-event pixel points in a camera frame image do not have matchable pixel points in the other camera frame image. Therefore, in the existing image vision processing methods, the depth information of these unmatchable non-event pixel points cannot be determined, and these non-event pixel points are very likely to become noise points. As a result, the accuracy of the depth information of pixel points in the whole frame image is reduced greatly. Consequently, subsequent processing operations based on the depth information of pixel points in the frame image are adversely impacted, or even the processing operations based on the depth information of the pixel points fails.