Computer databases now serve as storehouses for diverse types of information and media including documents, images, audio files, videos, and practically any other type of information capable of being transferred to a digital format. The interconnected nature of today's computing environment offers the capability for users to have nearly instant access to this information regardless of their physical location.
Search interfaces serve as gateways to the vast information stored in these databases. Due to the tremendous amount and diverse types of digital data that is now accessible, identifying relevant documents quickly and efficiently from a keyword search is a difficult task. This is particularly so because search systems have to index hundreds of millions of web pages and respond to millions of queries each day.
Indexes are the mechanisms used by search systems to allow relevant documents to be found as quickly as possible, and are generated based on continuous crawls of the world wide web. Building and maintaining an index for the entire web requires the collection and organization of raw documents having several tens of terabytes of data. As a result, responding to queries quickly and efficiently while identifying pertinent search results using an index remains a challenge.