Biometric information is used in a large number of applications today. Biometric identifiers such as fingerprints, palm prints, iris representations, face recognition, DNA, voice samples etcetera may be used for several purposes such as identification, authentication and so on.
When generating data representations of such information it often results in very large amounts of data. Data tables larger than 25 GByte are not uncommon. Searching within such databases comprising biometric information may therefore be time consuming as well as demanding a high level of processing power.
Traditionally, searching within biometric databases is performed by using numeric search keys within sorted index tables. Other types of search algorithms occur, for example linear search where the key is tested against entry values stored in the index table using an index, or pointer, incrementing or decrementing the index or pointer with a step of 1 until one or more matching entries are found or until all entries in the index table until the whole index table is searched without any matching entries. Often the tables are placed in a main memory able to comprise large amounts of data. Modern computers and servers often have access to several related processing units (cores) each comprising a small area of fast memory, so called cache memory. In addition, a main memory such as a common Random Access Memory (RAM) may be shared by the related processing units. Since cache memories are limited in terms of memory capacity, performing searches within large data volumes is time consuming given that the searches need to be redirected to the main memory. A plurality of simultaneously or almost simultaneously occurring inquiries, originating from threads executing in different cores, to a main memory may significantly degrade the performance of a computer or server comprising the main memory.
For example, a modern processor like the Intel i7 or similar may comprise 4 cores each having an 8 MB cache memory and a shared main memory of 32 GB. A database comprising biometric information may be of a size around 20 GB or sometimes more. An index table comprising main key values is associated with the database. Each main key value is associated with an index value pointing to information comprised in the database. A main search key typically comprises 32 bits, but may be smaller or larger than that. For performing a search relating to biometric information typically a numeric search for the 32 bit main search key is performed in the index table. If the index table comprises 109 key values, which is not uncommon when it comes to biometric applications, most of the steps within the numeric search will address the main memory since the size of the memory allocated for key values, indexes and/or related data being searched in every sub step in the search is larger than the size of the cache memory.
Modern CPU's has 3 different types of caches, L1, L2 and L3 caches. Depending on the manufacturers chip design, the L1 cache typically can provide the CPU with data on every clock cycle. The L2 cache may typically be able to feed data to the L1 cache every third cycle, while the L3 cache typically can feed data only every 12 cycles. The three levels exchange data based on CPU needs. For the Intel i7-4770 CPU the L1 data cache is 32 KByte, the L2 cache is 256 KByte, and the L3 cache is 8 MByte.
Finally the main memory can typically feed data to the L3 cache once every 50 cycles or at 2 percent efficiency. That's why it is so important even for a single core application to access memory in such a way that access to main memory is minimized.
The so called bsearch method is implemented in most runtime libraries, for example in Microsoft's C Runtime Library. bsearch, applied on a sorted array of records, operates in such a way that the array is stepped through with a step size initially equal to half of the array size. For each step forwards or backwards in the sorted array the step size is divided by 2. Applications using array sizes larger than the cache sizes will degrade performance, due to the CPU having to wait for data to be fed from main memory. The larger the array size, the more the application will suffer from such degradation.
Large servers may comprise as many as 64 or 128 cores or more. The performance of such servers when performing numeric searches in large databases will be significantly degraded, not only because of the cores having to wait for data from main memory as described above, but also when inquiries in search threads, executing in parallel in the same core, or in other cores, collide when accessing addresses not contained in the cache memories of the cores.
The memory space within a cache memory is shared between memory code and data. The address space contained within the cache memory is normally not under software control.
A method and system for matching two biometric images is described in US 2006/0104493 A1. An index table is generated for a first biometric image. A second image is selected and a number of patterns for each minutia of the second image are generated. Searching for matching patterns is then performed by usage of the generated pattern for each minutia to address the generated index table and then generating a match score for the second image.
In U.S. Pat. No. 6,711,562 B1 cache sensitive search tree (CSS-tree) index structures for improving search performance are disclosed. A search tree index system and method for locating a particular key value stored in a sorted array of key values is described.
Further prior art is known from WO 2008/030166, disclosing a method for searching a database comprising data related to a plurality of fingerprints and EP 1156432 disclosing an apparatus, data structure and recording medium for data retrieval by accessing retrieval tables.
Another method for accessing information based on high speed indexing is disclosed in WO 01/65418 A1. A search string, in this case an URL or URI, defining a resource, such as an image, a document or similar, is divided into segments, a mathematical operation is applied to each segment and the resulting numerical values, i.e. the key, are used as indexes in lookup-tables. One drawback of using lookup tables is that all possible values must be represented in the index tables. For example if the key value is 8 bits, the table size must be 28 entries. If MD5, as mentioned in WO 01/65418 A1, is used to produce the key value, tables of size 2128 must be used, which is practically impossible. Apart from MD5, other methods, such as CRC4, CRC8 and CRC12 are mentioned for producing the keys. None of these other methods guarantees that 2 different input strings do not generate identical keys as result. If identical keys are derived from different strings, means must be added to distinguish the correct resource before returning the result of the search. These drawbacks, table size and key uniqueness, leads to performance degradation. Further, WO 01/65418 A1 performs a 1:1 search.