1. Field of the Invention
The present invention relates generally to website traffic data collection, and more specifically to improved techniques for collecting traffic data from multiple sources and aggregating the collected data.
2. Description of the Related Art
Website providers often wish to collect data that describes usage and visitation patterns for their websites and for individual web pages within the sites. Such information can be extremely valuable in developing usage statistics for various purposes, including for example estimating server load, determining advertising rates, identifying areas of websites that are in need of redesign, and the like.
Several companies provide third-party traffic statistics services. A content provider can sign up with such a company to obtain traffic statistics without having to install usage-tracking software at their own servers. The content provider includes, in their web pages, scripts that cause users' browsers to communicate with the third-party services so that web activity can be tracked. The third-party services operate servers that detect individual user “hits” and thereby estimate traffic at the content provider's web pages.
One commonly used technique for third-party collection of usage data is to include, in each web page to be tracked, a small image, such as a single-pixel image that is not intended to be noticed by the user. Normally, images in web pages are served from the content provider's server, along with other content. The single-pixel image, however, which is specifically included in web pages for tracking purposes and normally does not contain any meaningful content, is served from a tracking server operated by the third-party traffic statistics service. In most cases, the single-pixel image is transparent, so as to be as unobtrusive as possible.
When a user navigates to a web page, the web page's HTML code causes the user's browser to send a request for the single-pixel image. The tracking server receives the request and logs the request as a user visit to the web page. It is known in the art to embed identifier codes within the image requests, so that the tracking server can detect individual users and discern additional identifying information about each user.
Referring now to FIG. 1, there is shown an example of a system 100 for website traffic data collection according to the prior art. User 112 interacts with client machine 107, which runs a software application such as browser 110 for accessing web pages. In response to a user 112 command, client machine 107 issues a web page request 111 that is transmitted via the Internet to content server 101. In response to request 111, content server 101 transmits HTML code 102 to client machine 107. Browser 110 interprets received HTML code 102 to display the requested web page on client machine 107.
As is well known in the art, HTML code 102 typically includes tags and pointers that specify additional content items to be included in the displayed web page. For example, HTML code 102 may include a pointer to an image, sound, applet, or other content item. For each of these auxiliary content items, browser 110 automatically sends a request to the server specified by the pointer. For many content items, the specified server may be content server 101.
As discussed above, HTML code 102 also includes a pointer to a transparent one-pixel image, or other unobtrusive element, that is used for traffic data collection purposes. The pointer may reference tracking server 106, which is typically a separate server operated by the third-party website traffic statistic service. In response to the pointer embedded in HTML code 102, client machine 107 issues a request 105 for the one-pixel image to tracking server 106. Tracking server 106 records the request in a log 108, and records additional information associated with the request (such as the date and time, and possibly some identifying information that may be encoded in request 105, or may be encoded in a cookie that accompanies or forms a part of request 105). Thus, tracking server 106 records the occurrence of a “hit” to the web page. Tracking server 106 also transmits the requests one-pixel image 109 to client machine 107 so that the request 105 is satisfied.
Similar techniques can be used for tracking responses to e-mail messages. An e-mail message sender can include single-pixel images in HTML e-mail messages, and can insert unique parameters or other identifying codes in the image path. Typically, the path points to a tracking server. Upon receipt of such an e-mail message, the user's e-mail client sends a request for the single-pixel image to the tracking server, which notes the unique identifying code (if any) and tracks the user's receipt of the e-mail message. Identifying codes can be cross-referenced to e-mail addresses, in order to verify receipt and/or response to an e-mail message by a user having a specific e-mail address.
In both web browsing and e-mail message applications, the tracking server can process the data stream generated by the loading of these one-pixel images in order to provide detailed usage statistics about web pages or e-mail messages. Various types of analysis techniques can be applied to these usage statistics so as to provide added value to content providers. Cookies can be stored on user machines so that repeat visitors can be identified as such.
Existing usage tracking techniques suffer from limitations, however. In particular, the unpredictable nature of Internet connectivity and availability has been the source of many problems when collecting usage data. If, for example, a portion of the Internet fails, or if for some other reason the image request does not reach the tracking server, the user's website visit may not be properly recorded. In addition, the delivery of the web page to the user may be delayed due to the failure of the tracking server to promptly transmit the single-pixel image to the user's browser. In some cases, such failure may even result in an error message after a time-out period where the browser does not receive the content. Such limitations and failures result from the use of a centralized tracking server to which all tracking image requests are sent.
What is needed, then, is a distributed usage tracking technique that allows for the use of multiple tracking servers. What is further needed is a usage tracking technique that provides appropriate redundancy so as to improve reliability of tracking data. What is further needed is a technique for aggregating usage tracking data from multiple tracking servers so as to provide an accurate representation of total traffic at a website.