A. Field of the Invention
Concepts described herein relate to search engines and, more particularly, to segmenting documents for indexing by a search engine.
B. Description of Related Art
The World Wide Web (“web”) contains a vast amount of information. Locating a desired portion of the information, however, can be challenging. This problem is compounded because the amount of information on the web and the number of new users inexperienced at web searching are growing rapidly.
Search engines attempt to return hyperlinks to web pages in which a user is interested. Generally, search engines base their determination of the user's interest on search terms (called a search query) entered by the user. The goal of the search engine is to provide links to high quality, relevant results (e.g., web pages) to the user based on the search query. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web pages. Web pages that contain the user's search terms are “hits” and are returned to the user as links.
In an attempt to increase the relevancy and quality of the web pages returned to the user, a search engine may attempt to sort the list of hits so that the most relevant and/or highest quality pages are at the top of the list of hits returned to the user. For example, the search engine may assign a rank or score to each hit, where the score is designed to correspond to the relevance and/or importance of the web page.
Local search engines may attempt to return relevant web pages within a specific geographic region. One type of document that is particularly useful for local search engines are business listings, such as a business listing found in a yellow pages directory. When indexing a business listing, it may be desirable to associate other information with the business listing, such as discussions or reviews of the business that are found in other web pages. For example, a web page may include a list of restaurants in a particular neighborhood and a short synopsis or review of each restaurant. It is desirable for the local search engine to accurately associate the text corresponding to each restaurant with the restaurant. Doing so can, for example, increase the search engine's knowledge of the business and thus allow it to potentially provide more relevant results to the user.