The identification and the use of points of interest in an image is a well-known function in the field of embedded vision.
Usually, one distinguishes three main functions which utilize the notion of points of interest in an image: a first function for detecting points of interest in an image or a sequence of images, a second function for calculating descriptors for the points of interest detected by the first function and a third function for pairing points of interest arising from two distinct images on the basis of the calculated descriptors.
FIG. 1 shows diagrammatically the stringing together of these three functions in the context of a device for pairing points of interest between two images (image 1, image 2).
The first detection function consists in determining the position, in an image, of noteworthy points, called points of interest which are points whose position may evolve temporally but whose presence is relatively stable from image to image. These points of interest may be for example corners, edges or other elements of a scene that are characterizable, by a given detection method. The known methods for detecting points of interest produce, for each point of interest, at least its position in the plane of the image and a detection score. Other data characterizing the point of interest can be generated such as the orientation or the scale factor of the point of interest.
The aim of the second function for calculating descriptors is to describe each point of interest in a robust manner. The existing methods are often based on the interpretation of the neighborhood of pixels around the position of the point of interest. For each point of interest, a vector structure called a descriptor is generated whose size varies as a function of the method used.
There are numerous methods for detecting and describing points of interest in an image. Two examples are in particular described in U.S. Pat. No. 6,711,293 which describes the method known by the acronym SIFT “Scale Invariant Feature Transform” and in European patent EP2027558 which describes the method known by the acronym SURF “Speeded Up Robust Features”.
The aim of the third function for pairing points of interest is to associate each point of interest detected in an image with the points of interest detected in another reference image. The reference image can be an earlier image of the same stream from which the analyzed image originates or an image originating from another stream but which corresponds to a snapshot of the same scene or else an image arising from a statistical database. The pairing step uses the descriptors calculated during the second step to associate two points of interest according to a proximity or resemblance criterion.
Methods for pairing points of interest are applicable in particular in the field of object tracking for which a pairing is performed between an image N and an earlier image N−k in the same image stream, one then speaks of temporal pairing. Another envisaged application relates to stereovision for which a pairing is carried out between two images from a viewpoint of the same scene in two different streams of images, one then speaks of spatial pairing. Still further applications are aimed at pairing an image with a prerecorded model.
The invention which is described in greater detail hereinafter in the document pertains to a particular architecture of a calculation device compatible with any method for detecting, describing and pairing points of interest.
The number of detectable points of interest in an image is not deterministic. Indeed, this number depends on the intrinsic characteristics of the image and is not predictable. Furthermore the calculation of descriptor on a point of interest is a relatively complex calculation according to the method used, thus implying that the calculation of the descriptors for the points of interest of an entire image is a very time-consuming approach in terms of calculation. In a context of embedded applications for which a real-time processing constraint exists, the step of calculating descriptors may lead to the breaking of the real-time requirements in the global operation of the pairing architecture.
Another problem arises from the fact that the steps of detecting and describing points of interest use neighborhoods of pixels of the image and involve storing a large portion of the image or indeed, quite often, the complete image. However, the use of a memory dimensioned to save a complete image is quite often incompatible with the low latency constraint imposed by the real-time requirement as well as with the constraints of memory areas available on an embedded architecture.
The problem envisaged by the invention therefore consists in designing a real-time architecture for the implementation of a method for selecting and describing and/or for pairing points of interest with non-deterministic distribution in an image received as a real-time stream which makes it possible to comply with the constraints in respect of execution speed, low latency and minimization of memory areas.
A first known solution making it possible to implement in purely software form a method for pairing points of interest consists in stringing together the three steps described hereinabove while performing the processings on the entirety of an image. This solution is not compatible with an embedded hardware architecture with real-time constraints for the reasons mentioned previously.
Solutions of hardware implementation of such algorithms are moreover known. The document “Implementation of High Performance hardware architecture of OpenSURF algorithm on FPGA, Xitian Fan, 2013” describes the implementation of a method which bases its detection and its description of points of interest on the SURF algorithm. The extraction of points of interest is performed without any sorting and all the points of interest detected are described. The processing time thus depends on the number of points of interest extracted. The drawbacks of this solution are the same as those already mentioned.
The solution described in the document “An embedded system-on-chip architecture for real time visual detection and matching, Jianhui Wang, 2013” is also known. This document describes an implementation of a method which comprises a step of detecting points of interest which is based on the SIFT algorithm and a step of describing points of interest and of pairing which is based on the BRIEF (Binary Robust Independent Elementary Features) algorithm. The detected points of interest are placed in a queue of the FIFO type and are processed one by one by the description step. The calculational load and the synchronization aspects are managed by a mechanism which tests the position of a point of interest with respect to the image neighborhood area in memory. If the point of interest is received in advance with respect to the refreshing of the neighborhood memory area, a blocking waiting mechanism is engaged. If on the contrary, the point of interest is received with a delay, it is not processed and one passes directly to the next point of interest.
This solution is not very robust to a non-homogeneous local distribution of the points of interest in the image. Indeed, if many points of interest are received located in an area of the image, there is a risk of losing a lot thereof. Likewise if no point of interest is received in a large area of the image, a significant timeslot is unutilized. Furthermore, the pairing step contains a double loop nested over the set of points of interest. It is necessary for all the points of interest of an image to be calculated to undertake a pairing with another image. This aspect introduces high latency for stereovision applications where the two images are processed simultaneously. Furthermore, two image memories are necessary.