1. Field of the Invention
The present invention is generally related to a computer architecture and method for collecting, analyzing and/or transforming Internet and/or electronic commerce data for storage into a data storage area for subsequent retrieval and analysis, and more particularly, to a computer architecture and method for collecting, analyzing and/or transforming Internet and/or electronic commerce data for storage in a data storage area for subsequent retrieval and analysis to support collection and analysis of Internet and/or electronic commerce data over or from the World Wide Web for Internet Service Providers (ISPs).
2. Background of the Related Art
More and more people are using the Internet as a method of communicating, advertising and shopping for and purchasing goods. The sale of Internet services is growing at an amazing rate. The number of projected users into the year 2000 is dramatically impacting the communications industry both from the standpoint of an opportunity to realize new business and as a concern due to the potential loss of traditional revenue sources. FIG. 1 illustrates this dramatic increase sales in World Wide Web (WWW or web) servers, a summary of which is presented below:
______________________________________ WORLD WIDE WEB SERVER SALES FORECAST 1995 1996 1997 1998 1999 ______________________________________ Intranet 475 2,673 5,483 9,210 13,133 Internet 621 979 1,410 1,777 2,159 Total 1,096 3,652 6,893 10,987 15,292 ______________________________________
The explosive growth in PCs, Servers and Internet related software has cultivated a need for companies to better understand their customer's needs. To better understand these needs, many gigabytes of data must be collected and analyzed to arrive at the best way to service the customer.
Market and industry analysts alike, believe that the Internet will prove to be the most significant innovation in modern history since the light bulb and automobile. The method in which we perform daily business operations will be changed forever due to this new technology. Many technology based companies in the computer industry are scrambling to outline new products and services using and exploiting the Internet as a vehicle to increase market share and revenue, while increasing productivity and cutting operational costs. FIG. 2 is an illustration of the estimated growth in web users over the next several years.
In an effort to meet the above needs of digesting the vast amounts of information on the web, companies have designed many browsers and millions of web pages to access, retrieve and utilize this information. In addition to the Internet, companies have set up local "intranets" for storing and accessing data for running their organizations. However, the sheer amount of available information is posing increasingly more difficult challenges to conventional approaches.
A major difficulty to overcome is that information contained on the web or web pages are often dispersed across the network at many sites. It is often time-consuming for a user to visit all these sites. One conventional approach used to access this information more effectively is called a search engine. A search engine is actually a set of programs accessible at a network site within a network, for example a local area network (LAN) at a company or the Internet and World Wide Web. One program, called a "robot" or "spider," pre-traverses a network in search of documents and builds large index files of keywords found in the documents.
A user of the search engine formulates a query comprising one or more keywords and submits the query to another program of the search engine. In response, the search engine inspects its own index files and displays a list of documents that match the search query, typically as hyperlinks. When a user activates one of the hyperlinks to see the information contained in the document, the user exit's the site of the search engine and terminates the search process.
Search engines, however, have their drawbacks. For example, a search engine is oriented to discovering textual information only. In particular, they are not well-suited to indexing information contained in structured databases, e.g. relational databases, voice related information, audio related information, and the like. Moreover, mixing data from incompatible data sources is difficult in conventional search engines.
Another disadvantage with conventional search engines is that irrelevant information is aggregated with relevant information. For example, it is not uncommon for a search engine on the web to locate hundreds of thousands of documents in response to a single query. Many of those documents are found because they coincidentally include the same keyword in the search query. Sifting through search results in the thousands, however, is a daunting task.
Accordingly, we have determined that there is a need to be able to effectively collect, translate, consolidate and/or clean the data and/or provide useful marketing information indicative of events occurring on the web. For example, data which indicates where a user has been in prior sessions may be useful in designing future products accessible via the web. We have also determined that there is a need for an architecture and method used to support and analyze Internet and/or electronic commerce over or from the World Wide Web for ISPs and CSPs.
We have further determined that there is the need for an architecture and method used to correlate user, application, and access functions. We have also determined that there is a need to provide a tool set that can easily communicate with, or become a subset of, an existing scaleable data warehouse to provide Internet marketing decision support. Unfortunately, conventional architectures and/or techniques are unable to organize and present this information in an efficient manner.