People have access to a great deal of information. However, finding the particular information they desire in any given situation can be very difficult. For example, a large amount of information is accessible to people over the Internet in the form of web pages. The number of such web pages can be on the order of millions or more. Additionally, the web pages available are constantly changing, with some pages being added, others being deleted, and others being modified.
Thus, when someone desires to find some information, such as the answer to a question, the ability to extract particular information from this large information source becomes very important. Processes and techniques have been developed to allow users to search for information over the Internet, and are commonly made available to the user in the form of search engines. However, the accuracy of such search engines can be lacking due in large part to the extremely broad range of content on web pages that are searched. For example, some web pages include copyright and other business-related notices, and some web pages include advertisements. Such business-related and advertising data is not always relevant to the underlying content of the web page, and thus can reduce the accuracy of the searching process if it is considered. By way of another example, different web pages can vary greatly in length, and some may include multiple topics while others contain a single topic.
These characteristics of web pages can reduce the accuracy of search processes. Thus, it would be beneficial to have a way to increase the accuracy of searching documents.