Many techniques are used to help Internet users locate desired information. Some techniques organize content based on a hierarchy of categories. A user may then navigate through a series of hierarchical menus to find desired content. Search engines on the Internet are tools to locate relevant content. Through a search engine, in response to a user's query, a rank ordered list and hypertext links are returned.
The goal of search engine implementation is to optimize the speed of the query, that is, find the documents where a particular word occurs. One implementation is to create a forward index which stores list of words in a document. The forward index is inverted to create an inverted index. The inverted index data structure is created which lists the documents per word instead of listing the words per document as in the forward index.
After the inverted index is created, the query can now be resolved by jumping to the word identifier (via random access) in the inverted index. Random access is generally regarded as being faster than sequential access.
Google™ is a popular search engine because users believe it is fast and accurate in comparison to other search engines. However, the forward index of Google™ is its fully inverted index with position information and the forward index is accessed by a document identifier and the inverted index is accessed by an index term identifier. The inverted index and forward index of Google™ have separate hit lists which store the word identifier and document identifier.
It is desirable to provide an improved search engine resulting in more accurate searches and results.