1. Field of the Invention
The invention relates to systems for creating catalogs of goods and services over a communications network. More specifically, the invention is directed to a method and system for aggregating content for an on-line catalog system.
2. Description of the Related Art
The Internet is a worldwide network of computers linked together by various hardware communication links all running a standard suite of protocols known as TCP/IP (transmission control protocol/Internet protocol). The growth of the Internet over the last decade has been explosive, fueled in the most part by the widespread use of software viewers known as browsers and HTTP (hypertext transfer protocol) which allow a simple GUI (graphical user interface) to be used to communicate over the Internet. Browsers generally reside on the computer used to access content on the Internet, i.e. the client computer. HTTP is a component of TCP/IP and provides users access to files of various formats using a standard page description language known as HTML (hypertext markup language), and more recently XML (extensible markup language) and XHTML (extensible hypertext markup language), a reformulation of HTML into XML. The collection of servers on the Internet using HTTP has become known as the “World Wide Web” or simply the “Web.”
As known and appreciated in the art, there are presently millions of Web pages with various content. Tools have been developed to allow the user to search these Web pages to obtain the various Web pages having the various content of interest. One way to locate the desired Web pages is to use a “search engine” which will search for Web pages having a particular keyword or key words. Search engines typically have three components: a crawler (such as a robot, bot or automated site searcher), an index, and a software program which presents the results of the search to the user. The crawler automatically “crawls” from Web server to Web server and the sites hosted therein to gather URLs and other information such as the text of the page that the search engine can use in the searches for keywords. When the information gathering by the crawler is completed, the information regarding the Web pages is stored in the search engine's databases and indexed. When a user seeking information from the Web types in a keyword(s) in a search field of the search engine, the search engine's software program then utilizes algorithmic functions and criteria to find keyword matches in the information stored in the databases. Some programs search all of the text of each page while other programs merely search the URLs and/or titles of the pages. The software program then sorts through the results of the search and provides a prioritized results to the user based on relevancy of the Web page. Various search engine software programs differ in their methods used for determining a Web page's relevancy. For example, the software may view the “meta tag” of the page, include a counter for counting the number of keyword occurrences on the text of the page, and/or consider the Web page's popularity as well as other factors such as whether the Webmaster of the Web page has made special arrangements to have the Web page displayed as a result of the search.
One of the primary applications of the Web has been shopping, i.e. the purchase of goods and services, i.e. products. Virtually every major commercial “bricks and mortar” merchant has established a Web site for the showcase and sale of their products. Further many manufacturers sell products directly over the Web. Finally, a plethora of on-line merchants, not previously existing in the bricks and mortar world, have come into existence. As a result, virtually every product is available for purchase over the Web from a plurality of merchants. This situation has increased the efficiency of markets by permitting shoppers to readily compare products and terms of sale from plural merchants without the need to travel physically to the merchant locations.
However, in order to compare products and terms of different merchants, one must “visit” the various merchant web sites individually. First, this requires knowledge of the URLs for each merchant Web site or the use of a search engine which can be cumbersome and inaccurate. It is possible to open the various sites in different browser windows for better comparison. However, the various formats of each merchant Web site render it tedious to compare products and terms directly. When a purchase decision is made, the purchase or purchases must be made through the individual merchant Web sites. Further, ordinarily the shopper is required to log in to each merchant Web site, by entering a username and password for example, prior to making a purchase and then proceed to the next site. For example, if the shopper decides to buy three items from three different merchants, three log in procedures and three buy procedures, i.e. procedures for effecting a purchase on the merchant Web sites, must be manually executed respectively through the three merchant Web sites and their proprietary interfaces.
It is well known to integrate a plurality of web sites into a single environment known as a “shopping portal.” Shopping portals ordinarily include a Web server presenting an integrated interface displaying plural products from various merchants. Accordingly, conventional shopping portals facilitate comparison shopping and thus increase market efficiency. In order to provide an integrated shopping experience, it is known to prepare a catalog of product offerings from various merchants organized in a taxonomy of product categories. However, since various merchants and other parties having product information records all store information in various data formats and layouts, collection of information for a product catalog is a tedious and labor intensive task requiring a great deal of manual operations.