The Internet is a collection of disparate computer systems which use a common protocol to communicate with each other. A common use of the Internet is to access World Wide Web (web) pages. Web pages are typically stored on a server and remotely accessed by a client over the Internet using a web browser.
A web site is a collection of web pages. A web site includes typically a home page and a hierarchical order of follow on web pages that are accessible through the home page. The web pages are connected to each other using hypertext links. The links allow a user to browse web pages of a web site by selecting the links between the web pages. Distinct Web sites may be respectively identified by respective distinct associated Internet domain names.
To increase user visitations and revenue, web sites have become very sophisticated. Web sites typically include web pages that provide information to users, advertise products or services to users and/or provide site search functions for users. A problem for web site owners is to determine how successful the web site is, for example, whether the informational or other needs of users are met and whether the users are purchasing advertised goods and services.
Programs for analyzing traffic on a network server, such as a worldwide web server, are known in the art. In these prior art systems, the program typically runs on the web server that is being monitored. Data is compiled, and reports are generated on demand—or are delivered from time to time via email—to display information about web server activity, such as the most popular page by number of visits, peak hours of website activity, most popular entry page, etc. Alternatively data is logged on the web server that is being monitored and the logs are transferred to another computer, where they are compiled and analyzed.
Alternatively, web sites use client side script, such as javascript, which is embedded into the web pages to monitor traffic. Such a script can collect information and submit it to the server where the information is analyzed and stored. The benefits of using a client side script are that: cached visits, such as BACK button navigation, can be monitored; and that non human (“robot”) traffic is not monitored, since robots don't normally execute the client side code which is referenced or embedded inside a web page.
Analyzing activity on a worldwide web server from a different location on a global computer network (such as “Internet”) is also known in the art. To do so, a provider of remote web-site activity analysis (“service provider”) generates javascript code that is distributed to each subscriber to the service (“subscriber” herein). The subscriber copies the code into each web-site page that is to be monitored. When a visitor to the subscriber's web site (“client” or “visitor”) loads one of the web-site pages into his or her computer, the javascript code collects information, including time of day, referring page, page visited, etc. The code then calls a server operated by the service provider—also located on the network—and transmits the collected information thereto as a URL parameter value. Information is also transmitted in a known manner via a cookie.
Each subscriber has a password to access a page on the service provider's server. This page includes a set of tables that summarize—possibly in real time—the activity on the subscriber's web site.
Because of limitations in javascript browser technology, special, non trivial, techniques are used to transmit the information when the recipient is located in a different domain than the web server on which the web site is located. Such techniques usually add the information which has to be transferred, to an http request for an image or some other web resource. This resource is located on the server of the service provider and as a result the request arrives there and not to the web server storing the web site. This is in contrast to more straight forward techniques available for sending data to the web server on which the web site is located; such as XMLHttpRequest which is a standard method for submitting data to a web server known to those skilled in the art. It should be clarified that whenever a third party service operator is involved, it usually means that a cross domain operation has to be supported.
Modern web site traffic analysis tools have been useful for tracking page-to-page navigation, e.g. where a visitor downloads one page and then clicks a link to transition to another page. Each click of a link causes the web browser to send a request over the Internet for the new web page, which is then downloaded from the web page server storing the web page and loaded within the browser running on the visitor's computer. The operation of conventional browsers such as Microsoft's Internet Explorer and Netscape Navigator are well known in the art. The active javascript within these pages reports back information every time a new page is loaded into the visitor computer's web browser.
Additional methods which include installation of executable routines on the visitor computer have been adopted in the past to monitor activities on computers, all with varying amounts of success. For example, Microsoft has developed Browser Helper Objects, which are a particular type of ActiveX® components, that can be adopted for monitoring purposes in Microsoft's Internet Explorer browser. (ActiveX® is a registered trademark of Microsoft Corporation, Redmond, Wash.). However, utilization of ActiveX® routines requires these executable routines be downloaded permanently onto a user's browser and further requires the user's affirmative response to a prompt requesting authorization to install the software. If the user declines, the activity of targeted web-based pages and transactions cannot be monitored, and the developer of such pages and transactions is limited regarding the amount of relevant data that can be recorded and evaluated to improve performance.
Developers have also coded and inserted monitoring applets within web pages to run on user browsers to monitor the performance of the browser while the pages are active on the browser. However, such applets generally can measure performance events only within the page in which the applet was embedded and therefore have limited value monitoring such browser-level events as navigating to a new page or page access aborts. Furthermore, because of limitations in browser technology, any data gleaned during these page applet-based monitoring functions can only be sent back to the web server originating the web page. Such a limitation imposes additional network communications load between the browser and the web server and adds processing load to the web server that must receive and somehow process the monitoring data. Additionally, should the web server go down or should the connection between the browser and the web server be lost following the download of the page to the browser, any monitoring data will likely be lost.
Due to limitations of applets and ActiveX® controls, the preferred method to collect information about the visitor on the client side is client side script, such as javascript. Javascript is allowed by default in most browsers and doesn't require an authorization from the user in order to be executed. However, it should be noted that an authorization can be requested, if desired, from the user before performing any javascript operation.
Prior art publication limit the information collected on the client side to “per-page” data such as: url, referrer, load time, ip, browser type, screen resolution, etc. This “per-page” data resembles data that was previously collected on the server side, in the form of web logs. A web server only knows about page requests and so web logs only contain “per-page” data. Client side data collection is not limited to “per-page” data, but evolved as such since at the beginning it was developed as a replacement for traditional server logs. For example, data accessible to client side scripts includes but not limited to: mouse movement, scrolling of web page, resizing of browser window, click events, keyboard use etc (“per-action” data).
Additionally, prior art publications often assume a linear model of web browsing, where a visitors goes from web page A to web page B to web page C. Where in reality, a visitor may open several windows, and then switch from one to another in any way he likes. Such linear model originates in the linear nature of web server logs from which traffic analysis evolved. A result of this linear model is that the time a visitor spends on a web page is measured as the time that passes from the load event to the unload event. However, this time usually doesn't represent the real time a user spent interacting with the page but rather the time the page was open.
Web site owners are increasingly interested in information about their visitors. Most web sites employ traditional methods and, hence, to compete they must find new ways to gather insights about how their users interact with their web sites.
Another option available to web site owners is to conduct web usability testing. This is done similarly to beta testing of software. Usually people are paid to use the web site and their actions and feedback are recorded with special software and hardware. Such processes usually take place in special labs designed for it. The disadvantage of this “active” approach is that it is expensive and that users might behave differently than when they aren't monitored. It should be noted, that the retail and supermarket industries regularly use both active and passive methods to analyze customer behavior.
The following U.S. patents and patent applications provide a brief description of some prior art monitoring solutions: U.S. Pat. No. 6,112,240 of Pogue et al., U.S. patent application publication serial number 2002/0143931 of Smith et al., and U.S. patent application publication serial number 2004/0054715 of Cesario and U.S. Pat. No. 6,944,660 of Eshghi et al.
Accordingly, it would be desirable to provide a system and method for tracking and analyzing web site traffic which will be: client side based, support cross-domain operation and will leg collect information beyond traditional “per-page” data.