1. Field of the Invention
The present invention relates to a system and method for analyzing traffic to a website.
2. Background of the Related Art
Programs are available for analyzing traffic to a website. One such program is described in co-pending U.S. patent application Ser. No. 09/679,297, filed Oct. 4, 2000, entitled “System and Method for Monitoring and Analyzing Internet Traffic”, which is incorporated herein by reference for all purposes and is assigned in common with the present application. These systems can be generally classified into two categories: log-based tools and Internet-based tools, with the aforementioned system being an example of a log-based tool.
Log-based tools for analyzing traffic to a website are generally operated by the owner of the website or their hosting provider. The source of raw data for log-based tools typically comes from the web servers hosting the website being analyzed. As visitors to the website request web pages, files, and embedded content, the web servers hosting the website are typically configured to automatically make entries into one or more log files describing each request. Log-based tools will read these log files as the source of raw data for the analysis.
Internet-based tools, such as that described in U.S. patent Ser. No. 09/326,475, entitled “Internet Website Traffic Flow Analysis”, by C. Glommen and B. Barrelet, are generally operated by the owner of the tool and provided as a service that website owners can subscribe to. To generate a source of data for the service, the website owner will typically copy JavaScript code provided by the service provider into the content of the website being analyzed. As visitors to the website request web pages, the embedded JavaScript code collects information and then calls a second web server operated by the service provider, transmitting the collected information.
Both log-based tools and Internet-based tools have their drawbacks. One of the drawbacks of log-based tools is that some of the traffic generated by visitors to the website may be intercepted by various caching systems—designed to improve Internet performance—before those requests get to the web server hosting the website. When this happens, the web server hosting the website never receives the request and therefore, does not make an entry into the log file leaving the data incomplete. Internet-based tools, on the other hand, benefit from being triggered by the visitor's web browser, so that even if the request is handled by a caching system, the JavaScript code in the content will still trigger the transmitting of data to the service provider.
One of the shortcomings of Internet-based tools is their inability to record and analyze requests for non-JavaScript enabled content such as PDF documents and other downloads. Because these file formats do not include any JavaScript capabilities, these requests never trigger the transmitting of data to the service provider. However, log-based tools will typically see these requests since they are still handled by the web server hosting the website. In general, Internet-based tools will only track content that includes scripting abilities such as HTML, whereas log-based tools can see other content requests as well.
One of the difficulties with traditional log based systems is tracking unique visitors, sessions, and loyalty metrics. Being able to uniquely identify a new visitor and a new session can be difficult with the increasing use of proxy systems that can mask IP addresses. And even if a visitor and session is uniquely identified, scanning potentially huge volumes of data for previous sessions can be a barrier to calculating visitor loyalty.
The above references are incorporated by reference herein where appropriate for appropriate teachings of additional or alternative details, features and/or technical background.