The World Wide Web is currently a subject of intense and rapidly growing interest.
The World Wide Web is composed of interconnected data sources that are accessible to computer users through data-communication networks such as the Internet. The data available on the World Wide Web have been assembled by private individuals, commercial companies, government agencies, and special interest organizations. Much of this assembled information is organized into Web pages. A Web site is a collection of Web pages (and possibly other data which, together with Web pages, are generically referred to as Web components) offered by a sponsoring entity, herein referred to as the site owner.
Large Web sites are typically organized hierarchically. For example, corporate Web sites often consist of smaller Web sites, each providing information about a business unit of the parent company.
The Web site itself resides on one or more server hosts. Web components stored on the server host are offered to users of the World Wide Web through a software program known as a Web server. A network user downloads data from a Web site through a browser, a software program running on the client host. The browser establishes contact with the Web server and issues a request for data stored on the server host. This results in data from the server host being downloaded into the browser. This data is typically a HyperText document specifying information required by the browser to display the Web page (i.e., formatting information specifying the structure of the page, or URLs of images that are to be placed on the page), embedded client software programs which run inside the browser (e.g., Java bytecode), and other content to be downloaded to the client computer or displayable through client software programs that add to the browser's functionality (sometimes referred to as "browser plug-ins").
A visit to a Web site is defined as a series of downloads, from a specified Web server by a fixed client browser, that are contiguous in time. Each request for a Web component made by a client browser during the course of a visit is referrred to as a hit. (In at least some cases, it may be useful for distinguishing separate visits to consider the dormancy period between successive hits by a given client browser. A dormancy period exceeding a threshold of, e.g., fifteen minutes, may be taken to indicate the end of a visit.)
Commercial Web servers have the option of recording client requests in a logfile, generating a separate entry for each hit. In many cases, the logfile resides (at least temporarily) on the server host. The information collected in this file can include the hostname or host address of the visiting client, the time of the hit, and the name of the requested data file. An illustrative record of a client request is given below:
147.atlanta-02.ga.dial-access.att.net hostname -- userid and authentication (not shown here) [30/Nov/1997:00:03:09-0500] date and time GET request method /work/work.html name of page requested HTTP/1.0 protocol used 200 return code 9391 number of bytes transferred http://biz.yahoo.com/lucent.html referral page Mozilla/2.02E (Macintosh; U; 68K) agent used (browser)
From a visitor's point of view, it is clear that a visit to a given Web site begins with an initial request to the Web server (the entry point), consists of a number of consecutive downloads, and ends when the visitor either: (i) begins to request pages from a different Web site, or (ii) stops browsing altogether. The visitor's final request is referred to as the exit point.
The server host, on the other hand, experiences hits from many users simultaneously, and it records all requests chronologically. Consequently, the server host mixes visit information from different clients in the logfile. Because of this, it is not immediately evident, from an examination of the raw logfile, which hits correspond to which visit. Even the length of a given visit is not immediately evident. The lack of this information prevents the compilation of higher-level summaries of usage patterns.
Some software tools are commercially available for summarizing and displaying data describing Web-site usage Often, these packages require the running of a special client software program in order to view the usage data.
One drawback of such a tool is that only those users who have installed the client software will have access to the Web site's usage information, even though many geographically separated people may have a legitimate interest in this information. This group may include, e.g., content providers, Web designers, and even visitors.
A second drawback is that such a specialized client approach can become impractical because of the cost to install and maintain the client program for each interested party.
A third drawback is that the presentation of the reported information is divorced from the immediate context of the Web site itself. Thus, although it may be convenient for the user to move quickly from a statistic about some Web page to the page itself, it is much less convenient for the user to move from any desired page or a feature of such a page to a corresponding statistic.
Other software tools provide reports, in the form of HyperText documents, on the usage of selected (such as the most popular) pages. Information from these reports can be displayed via the user-side browser, and links are provided for viewing the selected Web pages. However, these software tools also fail to provide convenient access from a Web page to the statistics that pertain to it.