1. Field of the Invention
The invention relates to systems for purchasing goods and services over a communications network. The invention also relates to an automated purchase support method and apparatus for seamlessly integrating plural merchants into an on-line shopping system. More specifically, the invention is directed to a method and system for aggregating content for an on-line purchasing system.
2. Description of the Related Art
The Internet is a worldwide network of computers linked together by various hardware communication links all running a standard suite of protocols known as TCP/IP (transmission control protocol/Internet protocol). The growth of the Internet over the last several years has been explosive, fueled in the most part by the widespread use of software viewers known as browsers and HTTP (hypertext transfer protocol) which allow a simple GUI (graphical user interface) to be used to communicate over the Internet. Browsers generally reside on the computer used to access content on the Internet, i.e. the client computer. HTTP is a component of TCP/IP and provides users access to files of various formats using a standard page description language known as HTML (hypertext markup language), and more recently XML (extensible markup language) and XHTML (extensible hypertext markup language), a reformulation of HTML into XML. The collection of servers on the Internet using HTTP has become known as the xe2x80x9cWorld Wide Webxe2x80x9d or simply the xe2x80x9cWeb.xe2x80x9d
Through HTML, XHTML, and interactive programming protocols, the author of a particular Web page is able to make information available to viewers of the Web page by placing the Web page on an Internet Web server. The network path to the server is identified by a URL (Uniform Resource Locator) and, generally, any client running a Web browser can access the Web server by using the URL. A client computer running a browser can request a display of a Web page stored on a Web server by issuing a URL request through the Internet to the Web in a known manner. A URL consistent with the present invention may be a simple URL of the form:
 less than protocol identifiers greater than :// less than server path greater than / less than web page path greater than 
A xe2x80x9cprotocol identifierxe2x80x9d of xe2x80x9chttpxe2x80x9d specifies the conventional hyper-text transfer protocol. A URL request for a secure Internet transaction typically utilizes the secure protocol identifier xe2x80x9chttps,xe2x80x9d assuming that the browser running on the client and the Web server control program running on the Web server support and implement the secure sockets layer discussed below. The xe2x80x9cserver pathxe2x80x9d is typically of the form xe2x80x9cprefix.domain,xe2x80x9d where the prefix is typically xe2x80x9cwwwxe2x80x9d to designate a Web server and the xe2x80x9cdomainxe2x80x9d is the standard Internet sub-domain.top-level-domain of the Web server. The optional xe2x80x9cweb page pathxe2x80x9d is provided to specifically identify a particular hyper-text page maintained on the Web server. In response to a received URL identifying an existing Web page, the Web server can return the Web page, subject to the HTTP protocol, to the client computer for display on the client computer. Such a Web page typically incorporates both textural and graphical information including embedded hyper-text links that permit the user of the client computer to readily select a next URL or send other data over the Internet. Further, a Web page can have embedded applets, written in Java TM or another programming language, to present animation and/or audio.
The URL issued from the client computer may also be of a complex form that identifies a CGI (common gateway interface) program (or script) on the Web server. The CGI program permits interactivity between the client computer and the Web server via HTTP. CGI is a standard for external gateway programs to interface with information servers, such as Web servers. A plain HTML document that the Web server delivers is substantially static. A CGI program, on the other hand, is executed in real-time so that it can process data or execute commands, such as executing a buy procedure which authorizes a purchase of products through a commerce Web server. A HTML form definition reference that identifies a CGI program is commonly of the form:
 less than form action-http://www.vendor.com/cgi-bin/buy.cgi method=post greater than 
A hyper-text link of this form directs the execution of the buy.cgi program on the Web server in response to a command from the client computer. For example, buy.cgi can be a buy procedure of the Web server. The Web has become ubiquitous in businesses and homes because it has proven to be convenient for various applications, such as news and data delivery, conducting banking and investment transactions, and the like. The Web and its authoring, transmission, and display protocols, such as browsers, HTML, CGI, Active Server PagesTM, and JavaTM, have become a worldwide standard for information exchange.
As known and appreciated in the art, there are presently millions of Web pages with various content. Tools have been developed to allow the user to search these Web pages to obtain the various Web pages having the various content of interest. One way to locate the desired Web pages is to use a xe2x80x9csearch enginexe2x80x9d which will search for Web pages having a particular keyword or key words. Search engines typically have three components: a crawler (such as a robot, bot or automated site searcher), an index, and a software program which presents the results of the search to the user. The crawler automatically xe2x80x9ccrawlsxe2x80x9d from Web server to Web server and the sites hosted therein to gather URLs and other information such as the text of the page that the search engine can use in the searches for keywords. When the information gathering by the crawler is completed, the information regarding the Web pages is stored in the search engine""s databases and indexed. When a user seeking information from the Web types in a keyword(s) in a search field of the search engine, the search engine""s software program then utilizes algorithmic functions and criteria to find keyword matches in the information stored in the databases. Some programs search all of the text of each page while other programs merely search the URLs and/or titles of the pages. The software program then sorts through the results of the search and provides a prioritized results to the user based on relevancy of the Web page. Various search engine software programs differ in their methods used for determining a Web page""s relevancy. For example, the software may view the xe2x80x9cmeta tagxe2x80x9d of the page, include a counter for counting the number of keyword occurrences on the text of the page, and/or consider the Web page""s popularity as well as other factors such as whether the Webmaster of the Web page has made special arrangements to have the Web page displayed as a result of the search.
One of the primary applications of the Web has been shopping, i.e. the purchase of goods and services, i.e. products. Virtually every major commercial xe2x80x9cbricks and mortarxe2x80x9d merchant has established a Web site for the showcase and sale of their products. Further many manufacturers sell produces directly over the Web. Finally, a plethora of on-line merchants, not previously existing in the bricks and mortar world, have come into existence. As a result, virtually every product is available for purchase over the Web from a plurality of merchants. This situation has increased the efficiency of markets by permitting shoppers to readily compare products and terms of sale from plural merchants without the need to travel physically to the merchant locations.
However, in order to compare products and terms of different merchants, one must xe2x80x9cvisitxe2x80x9d the various merchant web sites individually. First, this requires knowledge of the URLs for each merchant Web site or the use of a search engine which can be cumbersome and inaccurate. It is possible to open the various sites in different browser windows for better comparison. However, the various formats of each merchant Web site render it tedious to compare products and terms directly. When a purchase decision is made, the purchase or purchases must be made through the individual merchant Web sites. Further, ordinarily the shopper is required to log in to each merchant Web site, by entering a username and password for example, prior to making a purchase and then proceed to the next site. For example, if the shopper decides to buy three items from three different merchants, three log in procedures and three buy procedures, i.e. procedures for effecting a purchase on the merchant Web sites, must be manually executed respectively through the three merchant Web sites and their proprietary interfaces.
Recently, it is known to integrate a plurality of web sites into a single environment known as a xe2x80x9cshopping portal.xe2x80x9d Shopping portals ordinarily include a Web server presenting an integrated interface displaying plural products from various merchants. Accordingly, conventional shopping portals facilitate comparison shopping and thus increase market efficiency. However, conventional shopping portals merely serve as a gateway to the individual merchant Web sites. In particular, when a purchasing decision is made, the shopper is directed to the merchant Web site and the purchase is completed manually through the merchant Web site using the merchant Web site buy procedures and interface. Accordingly, when purchases are made from more than one merchant, conventional shopping portals require that the shopper execute the orders using different interfaces at the respective merchant Web sites.
U.S. Pat. No. 5,895,454 discloses an interface for merchant Web sites. A shopper connects to a remote merchant Web site through a shopping server. When a product is selected from a merchant server, a transaction notification is transmitted to a database on the shopping server. When the shopper is finished shopping, the shopper server transmits purchase orders corresponding to the selected products to the merchant web sites to complete the orders on behalf of the shopper. However, the system disclosed in U.S. Pat. No. 5,895,454 requires that the shopper navigate each merchant Web site individually to select products for purchase and thus, fails to address the complexities of content aggregation.
Therefore, there exists an unfulfilled need for a way to seamlessly integrate plurality of on-line merchants into a single shopping interface to thereby facilitate comparison shopping in an on-line environment. There also exists an unfulfilled need for a way to provide important product information to the shopper to facilitate an informed purchase decision by the shopper and for a method for efficiently gathering such product information from a networked computer environment.
It is an object of the invention to seamlessly integrate plural on-line merchants into a single shopping interface.
It is another object of the invention to facilitate comparison shopping in an on-line environment.
It is another object of the invention to provide important product information to the shopper to facilitate an informed purchase decision by the shopper.
It is another object of the invention to provide a method for efficiently gathering product information from a networked computer environment.
It is another object of the invention to provide a method for gathering product information from various sources including manufacturer""s product specification sources and merchant""s information sources.
It is still another object of the invention to provide a method for accessing and utilizing the gathered product information to effect commerce in a networked computer environment to facilitate purchase decision of the shopper.
These and other objects are achieved by a method of aggregating product information from a plurality of sources in a networked computer environment regarding products of a product category comprising the steps of causing a crawler originating from a server interconnected to the network computer environment to visit the plurality of sources and gathering product phrase information from each of the plurality of sources via the crawler, where the crawler utilizes computational linguistics to gather the product phrase information which includes a phrase and at least one characteristic of the phrase. The characteristics of the phrase may be at least one of frequency, location, font size, font style, font case, font effects, font color, collocation and co-occurrence of the phrase in each of the plurality of sources. In addition, the plurality of sources may include at least one of a manufacturer""s product specifications source, a product literature source, and a merchant""s information source. The crawler may include a product literature crawler as well as a product offerings crawler. The method in accordance with one embodiment of the present invention may also include the step of applying statistical analysis to the product phrase information to rank each phrase and the step of determining whether each phrase is a product property. The method may further include the steps of determining whether each product property is evaluative, numeric or discrete and the product properties may also be ranked. Moreover, the method in accordance with another embodiment of the present invention may including a validation step where information stored in a product offerings database is cross-referenced with information stored in a products database to determine whether any products identified in the product offerings database is absent from the products database. A new product record may then be created in the products database based on information stored in the product offerings database.
Yet another aspect of the present invention is a computer architecture for executing the above described aggregation method and for allowing shoppers to utilize product information to make an informed purchase of goods and services over a communications network.