A search engine or search engine program is a widely used mechanism for allowing users to search vast numbers of documents for information. Automated general search engines locate documents, such as web pages, by matching terms from a user entered search query to an indexed corpus of web pages. A conventional network search engine, such as the Google™ search engine, returns a search result set in response to the search query submitted by the user. The search result set can comprise a ranked list of documents with a link to each document and a summary of the document can be returned to the user. The search engine can rank or sort the individual articles or documents in the result set based on a variety of measures, such as, the number of times the search terms appear in the document and the number of documents that contain a link to a document. For example, one known method, described in an article entitled “The Anatomy of a Large-Scale Hypertextual Search Engine,” by Sergey Brin and Lawrence Page, assigns a degree of importance to a document, such as a web page, based on the link structure of the web page.
Many documents, such as web pages, present items for sale. Such shopping documents allow users to purchase items, either directly, such as by clicking on a link, or indirectly. Users wishing to compare prices on an item from different vendors can enter a query for the item in a general search engine and obtain a list of relevant documents. Similarly, there may be different versions of the item and the user may desire to see which version each vendor carries. In order to compare prices or versions the user must visit every document presenting the item for sale. Additionally, documents may be present in the search result set that are not shopping documents, but only discuss the item, such as reviews.
It is desirable to present to the user certain attributes of an item, such as price, version, and an image, from relevant documents in a search result set in response to a search query for the item. Manually searching through a vast number of documents to extract attributes of the item can be extremely time consuming and impractical for a large number of documents.
Thus, a need exists to automatically extract product information from a document in response to a search query from a user.