Database search is one of the most important problems in information retrieval. Over the years, several methods have been proposed to address this problem both in the context of text retrieval and in the context of image retrieval and object recognition. Four such prior art methods are described in the following documents, each of which is incorporated by reference herein in its entirety as background: NISTER, D. et al. “Scalable Recognition with a Vocabulary Tree,” believed to be published in CVPR, 2006, pp. 1-8; ROBERTSON, S. E. et al. “Simple, proven approaches for text retrieval”, Technical Report, Number 356, University of Cambridge, UK, December, 1994, pp. 1-8; FANG, H. et al. “Formal Study of Information Retrieval Heuristics”, SIGIR '04, Jul. 25-29, 2004, Sheffield, South Yorkshire, UK, pp. 1-8; and ZHOU, H. et al. “Okapi-Chamfer Matching For Articulate Object Recognition”, Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV '05), 2005, pp. 1-8. Another such method is described in U.S. Pat. No. 7,725,484 granted to Nister et al. on May 25, 2010, entitled “Scalable object recognition using hierarchical quantization with a vocabulary tree” that is incorporated by reference herein in its entirety, as background.
Several model based methods have been developed for rank ordering documents in a database, such as vector space models, logic-based models, and probabilistic models. Despite considerable progress in model-based approaches, it has been shown that a carefully designed metric based on Term-Frequency (TF) and Inverse-Document-Frequency (IDF) performs well in most applications. Metrics such as the Okapi score, pivoted normalization score, and normalized distance score (which are described next) have been tested with very good performance in the text retrieval literature and Normalized distance scores have shown to work well in the case for image retrieval.
Normalized distance scores is described briefly below, as per the following notations:                N: total number of documents in the database        Ni: total number of documents in the database that contain the ith word (or visual word in the case of image retrieval)        mij: number of occurrences of the ith word in document-j        ni: number of occurrences of the ith word in the query document        W: total number of words in the databaseNormalized Distance Score:        
      s    ⁡          (              q        ,                  d          j                    )        =                                                  q                                                          q                                            p                                -                                    d              j                                                                                        d                  j                                                            p                                                  p        .                  Where q=[q1 q2 . . . qW] and dj=[d1j d2j . . . dWj] and        
      d    i    j    =            m      i      j        ⁢    log    ⁢                  ⁢          N              N                  i          ⁢                                                    wherein mij is Term-Frequency (TF) and
  log  ⁢          ⁢      N          N      i      is Inverse-Document-Frequency (IDF)
      q    i    =            n      i        ⁢    log    ⁢                  ⁢          N              N        i            
Computing weights dij and qi described above requires knowledge about the number of documents in the database. In application scenarios where the number of documents in the database is fixed, the weights dij and qi can be pre-computed and stored in the database. They can then be used during querying time to find the most relevant documents in the database pertaining to the query.
Inventors of the current patent application note that in scenarios when the number of documents in a database changes with time, the weights would normally need to be re-computed each time the content of the database changes and a complete re-computation can be very expensive. Accordingly, the current inventors believe that there appears to be a need for a new approach to how weights are computed and how index information is maintained, as described below.