The present invention relates to methods for estimating optical flow in imaging techniques.
Optical flow is an approximation of the motion in a sequence of images varying over time. The first work on optical flows was done by engineers in the field of television and by persons interested in the modeling of biological vision. Since then, these techniques have found their place in a wide variety of disciplines, including computer vision and robot navigation. In particular, they are used to perform motion detection, object segmentation, calculations of collision time, motion-compensated encoding etc.
Optical flow is a visual measurement notoriously affected by noise. It is currently expressed as a velocity map within the image sequence. But estimating such map assumes the solving of an ill-defined problem, i.e. including too many unknowns in relation to the number of equations. As a consequence, to estimate flow vectors, additional hypotheses and constraints must be applied. However, these hypotheses and constraints are not always valid. Furthermore, the inevitable presence of stochastic noise in unfiltered natural image sequences gives rise to various difficulties connected to its use in the control loop of a mobile robot.
Optical flow techniques can be divided into four categories (cf. J. L. Barron, et al., “Performance of Optical Flow Techniques”, International Journal of Computer Vision, Vol. 12, No. 1, pp. 43-77):                energy-based methods express optical flow according to the outputs of velocity-adapted filters defined in the Fourier domain;        phase-based methods estimate the image velocities in terms of band-pass filter outputs;        correlation-based methods seek the best match between small spatial neighborhoods in temporally adjacent images;        differential, or gradient-based, methods use spatio-temporal derivatives of the image intensity and a hypothesis of constant illumination.        
The majority of work done on the design, comparison and application of optical flow techniques concentrates on correlation- or gradient-based approaches. However, all these methods suffer intrinsically from slowness of execution, so that they are poorly adapted to real-time execution constraints, which can exist in a certain number of applications.
Another motion detection solution relies on a visual sensor known as EMD (Elementary Motion Detector). EMDs are based on motion detection models reproducing supposed vision mechanisms of insects. Two adjacent photoreceptors are used to supply image signals that are then supplied to a bank of time-based high-pass and low-pass filters. The high-pass filters remove the continuous component of the illumination which does not carry any motion information. Then the signal is subdivided between two channels, only one of which includes a low-pass filter. The delay applied by the low-pass filter is employed to supply a delayed image signal which is then correlated with that of the adjacent non-delayed channel. Finally, a subtraction between the two channels supplies a response having sensitivity to the direction of motion, which can therefore be employed to measure visual motion. Motion detection by an EMD is sensitive to image contrast, the amplitude of the detected motion being larger when there is a high contrast. This disturbs the precision of the measurement of visual motions. Due to this lack of precision, EMDs are not suitable for general navigation applications, especially for tasks requiring fine motion control.
Unlike conventional cameras that record successive images at predefined instants of sampling, biological retinas only transmit a small amount of redundant information about the scene to be visualized, and do so in an asynchronous way. Event-based asynchronous vision sensors deliver compressed digital data in the form of events. A general presentation of such sensors can be consulted in “Activity-Driven, Event-Based Vision Sensors”, T. Delbrück, et al., Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2426-2429. Event-based vision sensors have the advantage of removing redundancy, reducing latency time and increasing the dynamic range compared to conventional cameras.
The output of such a vision sensor can consist, for each pixel address, in a sequence of asynchronous events representing changes in scene reflectance at the moment they occur. Each pixel of the sensor is independent and detects changes in intensity above a threshold since the emission of the last event (for example a contrast of 15% on the logarithm of the intensity). When the intensity change exceeds the fixed threshold, an ON or OFF event is generated by the pixel according to whether the intensity is increasing or decreasing. Since the sensor is not sampled on a clock like a conventional camera, it can take into account the sequencing of the events with a very high degree of temporal precision (for example in the order of 1 μs). If such a sensor is used to reconstruct an image sequence, an image rate of several kilohertz can be attained, as opposed to a few tens of hertz for conventional cameras.
If event-based vision sensors have promising prospects, to this day no practical method exists that is well adapted to determining optical flow on the basis of signals delivered by such sensors. In “Frame-free dynamic digital vision”, Proceedings of the International Conference on Secure-Life Electronics, Advanced Electronics for Quality Life and Society, University of Tokyo, 6-7 Mar. 2008, pp. 21-26, T. Delbrück suggests the use of “labelers” to give additional significance to the detected events, such as contour orientations or directions of motion, without however supplying any information that might make it possible to envision the estimation of an optical flow.
In the article “Asynchronous frameless event-based optical flow” which appeared in March 2012 in the “Neural Networks” periodical, Vol. 27, pp. 32-37, R. Benosman et al. describe the estimation of optical flows on the basis of events detected by an asynchronous sensor. The algorithm used is gradient-based and relies on the solving of a system of equations in which the spatial gradients at a given pixel of coordinates (x, y) are estimated by the difference between the events having arisen at this pixel (x, y) and those having arisen at the pixels of coordinates (x−1, y) and (x, y−1) at the same instant.
A need exists for a method for estimating optical flow that makes it possible to make estimations faster than the known practice for conventional cameras. There is also a need for a method for estimating optical flow on the basis of signals output by an event-based vision sensor, in order to be able to use various techniques and applications that have been developed relying on the employment of optical flows.