The invention relates to tools, for use, e.g., by a content provider for a Web site, for summarizing and displaying information descriptive of usage patterns exhibited by visitors to the Web site.
The World Wide Web is currently a subject of intense and rapidly growing interest.
The World Wide Web is composed of interconnected data sources that are accessible to computer users through data-communication networks such as the Internet. The data available on the World Wide Web have been assembled by private individuals, commercial companies, government agencies, and special interest organizations. Much of this assembled information is organized into Web pages. A Web site is a collection of Web pages (and possibly other data which, together with Web pages, are generically referred to as Web components) offered by a sponsoring entity, herein referred to as the site owner.
Large Web sites are typically organized hierarchically. For example, corporate Web sites often consist of smaller Web sites, each providing information about a business unit of the parent company.
The Web site itself resides on one or more server hosts. Web components stored on the server host are offered to users of the World Wide Web through a software program known as a Web server. A network user downloads data from a Web site through a browser, a software program running on the client host. The browser establishes contact with the Web server and issues a request for data stored on the server host. This results in data from the server host being downloaded into the browser. This data is typically a HyperText document specifying information required by the browser to display the Web page (i.e., formatting information specifying the structure of the page, or URLs of images that are to be placed on the page), embedded client software programs which run inside the browser (e.g., Java bytecode), and other content to be downloaded to the client computer or displayable through client software programs, that add to the browser""s functionality (sometimes referred to as xe2x80x9cbrowser plug-insxe2x80x9d).
A visit to a Web site is defined as a series of downloads, from a specified Web server by a fixed client browser, that are contiguous in time. Each request for a Web component made by a client browser during the course of a visit is referred to as a hit. (In at least some cases, it may be useful for distinguishing separate visits to consider the dormancy period between successive hits by a given client browser. A dormancy period exceeding a threshold of, e.g., fifteen minutes, may be taken to indicate the end of a visit.)
Commercial Web servers have the option of recording client requests in a logfile, generating a separate entry for each hit. In many cases, the logfile resides (at least temporarily) on the server host. The information collected in this file can include the hostname or host address of the visiting client, the time of the hit, and the name of the requested data file. An illustrative record of a client request is given below:
From a visitors point of view, it is clear that a visit to a given Web site begins with an initial request to the Web server (the entry point), consists of a number of consecutive downloads, and ends when the visitor either: (i) begins to request pages from a different Web site, or (ii) stops browsing altogether. The visitor""s final request is referred to as the exit point.
The server host, on the other hand, experiences hits from many users simultaneously, and it records all requests chronologically. Consequently, the server host mixes visit information from different clients in the logfile. Because of this, it is not immediately evident, from an examination of the raw logfile, which hits correspond to which visit. Even the length of a given visit is not immediately evident. The lack of this information prevents the compilation of higher-level summaries of usage patterns.
Some software tools are commercially available for summarizing and displaying data describing Web-site usage Often, these packages require the running of a special client software program in order to view the usage data.
One drawback of such a tool is that only those users who have installed the client software will have access to the Web site""s usage information, even though many geographically separated people may have a legitimate interest in this information. This group may include, e.g., content providers, Web designers, and even visitors.
A second drawback is that such a specialized client approach can become impractical because of the cost to install and maintain the client program for each interested party.
A third drawback is that the presentation of the reported information is divorced from the immediate context of the Web site itself. Thus, although it may be convenient for the user to move quickly from a statistic about some Web page to the page itself, it is much less convenient for the user to move from any desired page or a feature of such a page to a corresponding statistic.
Other software tools provide reports, in the form of HyperText documents, on the usage of selected (such as the most popular) pages. Information from these reports can be displayed via the user-side browser, and links are provided for viewing the selected Web pages. However, these software tools also fail to provide convenient access from a Web page to the statistics that pertain to it.
We have provided a mechanism for rapid and convenient access from any selected Web page to the usage information that pertains to it, and from any selected display of usage information to the Web page or pages to which it pertains. Respective displays of Web-site content and of usage information can coexist on the screen of, e.g., the user""s personal computer. Designation of an item of interest (by, e.g., clicking a mouse) in one of the respective displays results in the updating of information in the other display to correspond to the designated item. Moreover, our mechanism makes it readily achievable to synchronize one of the respective displays with the other. That is, as the user browses through one of the displays, the information in the other is automatically updated to correspond to that in the first display.
Thus, in a broad aspect, our invention involves a system for displaying information pertaining to the usage of Web pages. The system comprises first and second Web sites. The first Web site comprises plural Web-component files, each having a name in a Web-site directory. The second Web site comprises plural statistics files, each containing usage information about a corresponding Web-component file or sub-directory of Web-component files. The system further comprises a computing device that has a display screen, is operable by a user, and is in communication with the first and second Web sites. The computing device is operated under the control of Web-browser software effective for displaying, on the screen, Web components of the respective Web sites. Significantly, the computing device is effective for requesting and retrieving, from either of the Web sites, data that correspond to user-designated Web components, and it is effective for directing a data request to either of the Web sites in response to user-designation of a Web component from the other Web site.
Our preferred access mechanism involves a relationship between the Web site and the database in which the usage information is stored. As is well known, each Web component (i.e., Web page or one of its basic data building blocks) resides in a file, accessible through its URL. According to our access mechanism, the database that contains the usage information is organized such that each record is indexed by, and thus is retrievable under, the name of the corresponding file in the Web site directory.
Thus, in specific embodiments of the invention, each statistics file is indexed by the name of the corresponding Web file or sub-directory of Web files, and the computing device uses a common name of a file or file directory when it directs a data request to one Web site in response to user-designation of a Web component from the other Web site.
In further embodiments of the invention, we additionally provide a mechanism for distinguishing, by respective visit, visit information from a Web-server logfile, and for extracting informative usage statistics from such information.
Browser: a software program that runs on a client host and is used to request Web pages and other data from server hosts. These data can be downloaded to the client""s disk or displayed on the screen by the browser.
Client Host: a computer that requests Web pages from server hosts, and generally communicates through a browser program.
Content Provider: a person responsible for providing the information that makes up a collection of Web pages.
Embedded Client Software Programs: software programs that comprise part of a Web site and that get downloaded into, and executed by, the browser.
Hit: the event of a browser requesting a single Web component.
Host: a computer that is connected to a network such as the Internet Every host has a hostname (e.g., mypc.mycompany.com) and a numeric IP address (e.g., 123.104.35.12).
HTML (HyperText Markup Language): the language used to author Web Pages. In its raw form, HTML looks like normal text, interspersed with formatting commands. A browser""s primary function is to read and render HTML.
HTTP (HyperText Transfer Protocol): protocol used between a browser and a Web server to exchange Web pages and other data over the Internet.
HyperText: text annotated with links to other Web pages (e.g., HTML).
IP (Internet Protocol): the communication protocol governing the Internet.
Logfile: a file residing on the Web site in which the Web server logs information about browsers requesting Web components. The logfile typically contains one line per hit.
Pageview: the event of a browser downloading some or all of the Web components that make up a Web page and displaying the Web page. Pageview often consists of several hits.
Referral Page: the URL of the Web page containing the HyperText link that led a visitor to the data currently being viewed. In most commercial browsers, the BACK button returns the visitor to this referral page.
Server Host: a computer on the Internet that hands out Web pages through a Web server program.
URL (Uniform Resource Locator): the address of a Web component or other data. The URL identifies the protocol used to communicate with the server host, the IP address of the server host, and the location of the requested data on the server host. For example, xe2x80x9chttp://www.lucent.com/work.htmlxe2x80x9d specifies an HTTP connection with the server host www.lucent.com, from which is requested the Web page (HTML file) work.html.
UWU Server: in connection with the present invention, a special Web server in charge of distributing statistics describing Web traffic.
Visit: a series of requests to a fixed Web server by a single person (through a browser), occurring contiguously in time.
Visitor: a person operating a browser and through it, visiting a Web site.
Web Component: a basic data building block that makes up a Web page. A Web component may contain text, HyperText images, embedded client software programs, or other data displayable by a browser (such as, for example, QuickTime videos).
Web Designer: a person, typically one skilled in graphical design, who has charge of to designing Web pages.
Web Master: the (typically, technically trained) person in charge of keeping a host server and Web server program running.
Web Page: a canonical piece of multimedia information on a Web site. A Web page is typically an HTML document comprising other Web components, such as images.
Web Server: a software program running on a server host, for handing out Web pages.
Web Site: a collection of Web pages residing on one or multiple server hosts and accessible through the same hostname (such as, for example, www.lucent.com).