Electronic data is being created and recorded in staggering amounts as our world becomes increasingly computerized. Unfortunately, finding particular data within discrete data sets becomes increasingly difficult as the amount of data grows. Efficiently searching for relevant data, whether in databases or in distributed environments such as the World Wide Web (the “Web”) typically includes accessing one or more electronic indexes. In many computing environments, the index is created and maintained by commercially available database products. In the context of the Web, indexes are created and maintained by a variety of Search Engines accessible via the Internet. The challenge in most environments is keeping the indexes current—reflecting the data as the data is added, removed and updated in the environment.
Inverted indexes are a type of index used in databases and search engines for indexing many-to-many relationships. An inverted index typically consists of a plurality of records, with each record having a key and one or more associated references. Each reference indicates the presence of the key in the referenced material. For example, an index of Web pages may contain many records with a word identifier as the key and a reference to a Uniform Resource Locator (“URL”) of the Web document that contains the word.
Conventional indexes typically associate index “keywords” against electronic documents. For example, the keyword “conventional” would be associated with this document if indexed by one of these conventional indexing systems. The presence of a keyword in a document, however, does not guarantee the relevance of the document to a given search. The word “conventional” may also be associated with every other document in which it has been used. With billions of documents in an ever expanding digital universe, and a limited number of words used to construct those documents, simple keyword searches are seem destined to bury relevant materials within huge piles of irrelevant materials. The problem of finding relevant materials within large datasets of irrelevant materials has long been recognized. Various approaches have been taken to refine keyword searches. For example, some calculate and use the proximity of one keyword to another in a document. Another approach is to generate statistical models associating keywords with each other.
The indexing and searching of electronic information remains one of preeminent challenges of our day. There is an unmet need for improved systems and methods for generating useful indexes and efficiently searching those indexes to find relevant materials.