Advances in digital data storage mechanisms have allowed for the accumulation of large amounts of data. For instance, the ever reducing costs of network storage has allowed for exponential growth of internet accessible web documents. This massive accumulation of data presents a challenge to present meaningful information to users and systems.
Traditionally, users have relied on information retrieval systems such as web search engines that employ methods such as basic word counting to rank results and natural language processing to present alternate queries. When a user enters a query, they typically are provided results from a variety of sources that have varying degrees of relevancy to the user's actual information goal. For example, a user entering a search for Washington may get results related to the state of Washington, President George Washington, city of Washington D.C., George Washington University, Washington Post, and Washington Wizards. Furthermore, the results will vary in degree of relevancy to each of the topics. For example, some resulting documents may have casual references to President George Washington and others will have President George Washington as the main topic. While all of the results are legitimate possibilities, the user is likely only seeking information related to one of the topics or possibly a topic not included in the list above.