Identifying a particular object in a collection of images is a challenging problem because objects' visual appearance may be different due to changes in viewpoint, lighting conditions, or due to partial occlusion. Various solutions performing relatively well with small collections already exist as well as solutions that demand significant processing resources for larger collections.
For example, a method of identification of objects in images has been described in WO2011161084. The method comprises a feature extraction stage, an indexing stage of reference images and a stage of recognition of objects present in the query image. WO2011161084 describes voting in a reduced space as part of the recognition stage of objects. However, the voting process described is performed in rotation and scale space. In other words, accumulators where the votes are aggregated have two dimensions, one corresponding to the rotation of the matched objects and another to the scaling between the matched objects. Such accumulators have a relatively high memory requirements, e.g. using R rotation bins, S scaling bins, and floating point representations of votes accumulated in each bin (32 bits) each accumulator requires at least R×S×32 bits. This particularly limits image recognition systems when implemented in mobile platforms, e.g. mobile devices such as mobile phones.
FIG. 1 shows an inverted file structure as disclosed in WO2011161084 for the indexing stage. The indexing stage involves extraction of local features for reference images and their organisation into a structure allowing their fast matching with features extracted from the query images. This process consists of (i) key-point extraction and (ii) post-processing, (iii) assignment of key-points to visual words, (iv) estimation of voting weights, and (v) addition of key-points to the inverted file structure as the so-called hits. Adding a new reference object to the database involves adding hits representing key-points to the inverted file structure. In the inverted file structure there is one list (hit list) for every visual word that stores all occurrences (hits) of the word in reference images. An implementation of the initial voting approach from WO2011161084 accesses memory addresses randomly. The random order is acceptable when all accessed memory can be fetched from CPU core cache (e.g. L1 in case of most Intel's CPUs), which is much faster than accessing the main system memory. However, when the collections of reference images are large, and the accessed memory is much larger than what fits in the CPU core cache, the cache is constantly flushed. This not only makes the CPU memory caching useless for the voting process, but also affects all other running recognition processes that benefit from CPU memory caching. As a result, for large collections the speed of a naive implementation of the voting from WO2011161084 is several times worse than what could be expected by simply linearly extrapolating the speed obtained for small number of reference images.
It is desirable to provide devices and methods for image recognition that at least partially solve the aforementioned problems.