1. Field of the Invention
This invention relates generally to computer network communications and, more particularly, to analysis of Internet traffic to a website.
2. Description of the Related Art
The best-known computer network in use today is probably the Internet. When a network user begins a communication session over the Internet, the user can request data files from an Internet-connected computer called a file server or website server. The file server provides data files, typically comprising website pages, that are requested by the user. The web pages are typically written in a type of programing code called hypertext mark-up language (html), and can be viewed or displayed through a graphical user interface (GUI) program. The network connections and collection of html files are commonly referred to as the xe2x80x9cWorld Wide Webxe2x80x9d and the GUI program to view the files is called a web browser. A collection of related files under a common Internet network domain location is commonly referred to as a website. The files available at a website can include a mixture of text, image, video, and audio data.
The Internet is being used more and more often for commercial purposes. Thus, websites have become an important means for businesses and individuals to disseminate new product and service information, public relations news, advertising, and even to transact business by receiving orders and payments over the Internet. Many Internet users who maintain websites carefully monitor the pattern of web browser requests for files, or web pages, from their website. As with most forms of advertising, a goal of providing a website is to have a large number of visitors to the website to view the commercial presentation. As a result, various software programs and monitoring services have been developed to track such file requests, which are generally referred to as web traffic, and to produce website traffic analysis reports. The Internet users who maintain websites will be referred to generally as website owners.
Some website traffic analysis tools comprise software that collects traffic statistics from the computer at which the website files are stored, or hosted. Website owners typically have, at a minimum, access to the computer on which their sites are hosted. Some website owners maintain their website on their own computer systems, but many lease storage space on a remote computer system from a web-hosting provider. Web hosting providers typically impose restrictions on the use of their systems, so that installation of third party software on the website hosting computer, or webserver, is usually precluded. This limits the appeal of website traffic analysis software that must be installed at a website hosting computer.
There are generally two common types of website analysis tools available: log-based tools and Internet-based tools.
Most website file servers can be configured to store information in a log file for every website page request they receive. Statistics concerning every request for a page from the website are recorded in the log file, thereby creating a record that can be analyzed to produce a website traffic report. The statistics typically include time of day, browser location, referring link, and the like. The creation of the log file will occur automatically, as html documents are requested by browsers. The log file can be analyzed periodically to process and summarize the collected statistics. The steps for retrieving html documents from a website are summarized below in Table 1.
A common Internet device called a proxy server or web cache often complicates the process of creating a log file in response to html requests. A proxy server stores static web content (html data files) at a central location on the Internet to improve web performance by decreasing response times and also to decrease capacity requirements at individual web file servers by distributing the storage load. If a web browser is communicating via a proxy server, then the steps required to obtain an html document are as listed in the Table 2 summary below:
It should be noted that Step 5 may result in a proxy server or web cache serving the html document directly to the web browser. In this case the webserver never receives a request and thus never records the request in its log file. Some proxies may notify the web server of these requests, depending on proxy and web-server configurations.
The log-based tools that exist today examine logs to form a statistical analysis of a website. Log analysis tools are run locally on the computer or on a system to which the log files are transferred. The steps that are taken to analyze the logs are summarized below in Table 3.
There are several drawbacks to using log analysis tools. For example, if the log analysis tools are installed locally on the website file server, then the web site owner can only use tools that are available for the platform of the hosting file server computer. These tools typically do not provide a real-time view of the website traffic data, rather, they simply analyze the traffic log data files when the tools are executed. In addition, website owners who operate very high-traffic sites often disable logging, because of inadequate computer resources to operate both the file server function and the traffic logging function. This severely reduces the effectiveness of log-based analysis. As noted above, website owners who do not host their sites may need to obtain permission to install traffic analysis software at their hosting provider. Finally, the system resources required to perform the log analysis function are such that log analysis can impede website performance, slowing down responses to web browsers who want to view the website.
If the website owner chooses to transfer the log files to another system for processing, then there are also drawbacks to that technique. For example, transferring the traffic logs is a cumbersome process in that a single day""s log file for a busy web site often represents hundreds of megabytes of data. In addition, an additional system is required for receiving the transferred log files.
Rather than using web traffic logs, website owners may choose to use the services of an Internet-based traffic analysis service. Such services typically are notified of each website visit through Internet communications features of the http specification and prepare statistical reports for client website owners.
The Internet-based analysis typically relies on special html code inserted into the home page of a website. Web html documents usually contain a mixture of text and graphical elements that are only retrieved by opening subsequent connections and retrieving files, such as images or frames. These graphical elements are displayed in the user""s browser by retrieving the source location attribute (also referred to as the source Uniform Resource Locator, or URL) of the graphical element, opening a connection to the source location, and retrieving the data bytes that comprise the image file.
Internet-based analysis tools can count requests for a source file regardless of the actual location of the source file. Thus, a graphical element can be retrieved from a server that is in a different location from the primary website file server. The Internet-based analysis tools use embedded html code usually contains a graphical element, as described above, with a source location attribute directing a browser to the computers of the traffic analysis service. When the website home page is visited, all of the images on the home page will be requested by the browser, including the graphical element source. Headers that are automatically supplied with the browser request will reveal information about the website visitor.
The information about the website visitor is gathered and stored at the traffic analysis computers, and then the requested data, such as a graphical image, is returned to the website visitor""s browser. Those skilled in the art will be familiar with so-called Internet xe2x80x9ccookiesxe2x80x9d, which can be used to timestamp the website visitor. The timestamp value can uniquely identify a website visitor. The html code sometimes contains scripts that gather more information (such as monitor resolution, etc.) about the website visitor. This additional information is typically included with the request headers. In this way, Internet-based traffic analysis tools can provide traffic statistics in real-time by recording each request for a website page when the request occurs. That is, each website visit generates a xe2x80x9chitxe2x80x9d statistic that can be accumulated from all the visitors to produce a count of requests for the website home page. These hits are interpreted as a count of the number of visits to the website.
Unfortunately, Internet-based tools only collect statistics when a page that contains the html code is viewed, so that visits to all other pages are ignored. To alleviate this problem, some traffic analysis services allow the html code to be placed on multiple pages, therefore counting a website visitor when there is a request for any page on which the website owner has placed the html graphical element code. Thus, the Internet-based analysis tools can be an improvement over log-based tools, because the Internet tools do not require the transfer of large log files for processing, but unfortunately, typical Internet-based tools do not account for, and do not collect data on, multiple visits to a website by the same visitor. In addition, conventional Internet-based traffic analysis tools do not indicate the actual sequence, or path, followed by a website visitor from page to page of a website. The website path taken by visitors can be very important in identifying the most viewed pages of a website or least popular.
From the discussion above, it should be apparent that there is a need for efficient analysis of website traffic patterns with sufficient detail so that visits to each page of a web site can be accounted for and a visitor""s path through the pages of a website can be tracked, all without unduly taxing the resources of the website and traffic analysis computers. The present invention fulfills this need.
The present invention provides an Internet-based analysis tool that can follow, in real-time, the flow of traffic through a website. For every website page requested by a website visitor, the state of the visitor""s browser is recorded. The state includes the clock time and an indication of every page at the website visited by the browser during the current visit. In this way, data relating to the path visitors take through a website can be accurately collected and studied. The state of the visitor""s browser path is maintained in a traffic analysis cookie that is passed between a traffic analysis file server and the visitor browser with every website page requested from the website file server for viewing. The cookie is maintained in a size that can be passed from server to browser and back again without negatively impacting server performance and without negatively impacting browser performance. If desired, the functions of website file server and traffic analysis cookie management can be handled by respective website file server computers and traffic analysis computers. The data in the traffic analysis cookie can follow the visitor browser through independent web file servers, regardless of how the pages of a website might be distributed in storage. In this way, the present invention permits page-by-page analysis of website traffic patterns for each website visitor, without unduly taxing the resources of the website and traffic analysis computers.
In one aspect of the invention, the first page of a website that is viewed by a visitor defines the start of that visitor""s path at the website. Page requests, transferred data, and cookie exchanges are made through the hypertext transfer protocol (http) specification. Thus, each page request contains an http referrer header with the URL with which the respective connection is associated. Through the use of referring data, a computer system constructed in accordance with the invention can monitor the page-by-page flow as a visitor xe2x80x9ctravelsxe2x80x9d through the website. Every page requested and viewed by that visitor is recorded and maintained in the traffic analysis cookie, which is passed back and forth between the visitor browser and the traffic analysis server to maintain a record of the visitor website path. New data pertaining to the site""s dynamics is thereby collected and analyzed with every passing of the cookie. This process continues until the visitor leaves the web site, or until the cookie has expired.
The collection of website path data in the cookie permits the data to be analyzed and available for viewing in real-time. This allows the website owner to monitor website traffic patterns in real time. It also permits the website owner to monitor changes to a website in real time, reducing the time it takes to understand the impact of the changes. In another aspect of the invention, website owners can follow a browser path over multiple file servers, so long as the web pages contain the html code necessary to maintain the cookie with its path data. That is, the invention permits monitoring of website traffic even for websites that are hosted across several independent file servers.
Other features and advantages of the present invention should be apparent from the following description of the preferred embodiment, which illustrates, by way of example, the principles of the invention.