Web clients, such as Web browsers, request resources from Web servers. Owners of Web servers use Web analytics tracking software to generate reports of this user activity on the site. Web analytics packages generally track aggregate data, such as the number of requests for a given Web page (see Web Page, below), and user-specific data, such as the sequence of Web pages requested by Web clients during a visit.
Terminology
This section defines terminology used throughout this document.
Web Protocol and HTTP
A Web protocol is a communications technology used to communicate between devices connected to the Internet. Web protocols include HTTP, HTTPS, Web services, AJAX, WebDAV, and others including those not yet developed. Because most Web communications occur over the HTTP protocol, this document uses the term HTTP as a synonym for Web protocol, but the invention that is the subject of this document could be used with any Web protocol.
Web Client
A Web client is a device that can access content on the Internet using one or more Web protocols. A Web client could be a Web browser, an email client, an RSS reader, or a mobile or other digital device.
Web Server
A Web server is a device that responds to requests placed over the Internet using Web protocols. A server can include multiple virtual or physical machines providing “responses to web protocol requests” such as the pages of a Web site, RSS data, or any other data. Often such web servers are behind a device that balances the workload between the servers.
Web Site
A Web site is a collection of Web pages, Web application components, and other resources residing on a Web server that collectively implement a complete Web solution. Each Web site typically corresponds to a single domain or subdomain.
Web Analytics
Web analytics refers to a software system that tracks and reports requests from Web clients to one or more Web servers. Web analytics include aggregate information, such as the total number of visitors or number of visits to a specific page, as well as sequential information, such as the sequence of Web Protocol requests sent by each user (such requests typically being interpreted as web pages).
Session
A session is a sequence of interactions between a Web client and a Web server, such as a series of Web pages (protocol requests) on a Web server accessed by a Web client by a single Web user between the user opening and closing the Web client.
URL, Protocol, Domain, Path, and Query String Parameters
A URL identifies a resource on a Web server. A URL can include a Web protocol specification, a domain, a top-level domain, a subdomain, a port, a path, an extension, any number of query string parameters, an anchor, and various characters used to separate these tokens. For example, in the URL http://www.domain.tld:port/path/to/resource.html?key1=value1&key2=value2#anchor:                http represents a Web protocol specification.        www represents a subdomain within a domain.        domain represents a domain.        tld represents a top-level domain.        port represents a port to be used by the TCP/IP protocol.        /path/to/resource represents a path.        html represents an extension.        The question mark character (“?”) separates the path from the query string parameters.        key1 represents the name of a query string parameter.        value1 represents the value of the query string parameter named key1.        key2 represents the name of a query string parameter.        value2 represents the value of the query string parameter named key2.        anchor represents an anchor identifier.        All other characters separate these various tokens.        URLs can have even more properties than described above and are well described in various Internet RFCs.HTTP Request/Response, Primary Request, and External Request        
Web clients interact with Web servers by placing requests over the Internet using HTTP or other Web protocols. An HTTP request includes a URL, as well as additional information such as state information (e.g. cookies) and details about the Web client placing the request. The Web server responds using the HTTP protocol, typically with a message containing HTML or XML markup. Elements in the response to the primary request can contain references to other resources that result in the Web client placing additional HTTP requests to retrieve resources, such as images, style sheets, JavaScript source code, multimedia and other components, either from the same Web server, or from additional Web servers. The original URL requested by the Web client is the primary HTTP request. As the Web client processes the response from the Web server, the Web client initiates additional HTTP requests for these components to construct the Web page.
After a Web client loads all of the resources that make up a Web page, logic within the Web page can cause the Web client to place additional HTTP requests, such as using AJAX (see Javascript and AJAX). The term request simply indicates the device initiating the communication, and does not always indicate that the Web client has requested a resource from the Web server; the request sometimes conveys more information than the response. The Web client may process certain responses from the Web client using logic rather than outputting them in the Web page rendered in the Web client.
A Web client can initiate a Web request for a number of reasons. A user may launch the Web client, causing it to request the user's home page and all the resources it references. The user could enter data into a search Web component and click a button, causing the Web client to place another request to the same Web server or another Web server. The user could click a link in the search results, causing the Web client to request another Web page, or triggering custom logic such as JavaScript that could initiate one or more HTTP requests.
Each HTTP request corresponds to a URL, but can include additional information such as the type, version, and features of the Web client, the Web client's language preferences, the state of Web components in the current Web page, cookies identifying the user or controlling other features (see Cookies, below), and other information.
A Web page may contain markup or other content that causes the Web client to place one or more External Requests to achieve Web analytics functionality. For example, a Web page may contain JavaScript that causes the Web client to place an HTTP request not for a Web page, image, or other content resource to display within the primary Web page, but to transmit information about the primary request to a Web analytics server.
Web Page (Page)
The term Web page, or page, describes the result of a complete HTTP request and response, including any additional HTTP requests and responses required to load images, CSS files, and other resources. The term Web pages includes both the data and communications required to represent and transmit the Web page, and the user's experience of that collection of resources as a single logical component in a Web client. Web clients other than Web browsers, such as RSS readers and mobile devices, can access Web pages.
Web Component
A Web component is a feature within a Web page, such as an image, a search box, a login form, a shopping cart, a multimedia resource, or another type of component. A Web page typically consists of a hierarchy of numerous Web components, such as an HTML table containing a login form containing a username text input component, which may exist within a Web page that also contains a search box, a survey, any number of links, and any number of additional Web components.
Component Impression
A component impression is a single distribution of a Web resource, such as the inclusion of an advertisement which makes up only a part of a Web page. A Web page can include multiple component impressions.
Form Post and Postback
Certain events within Web components in a Web page can cause the Web client to place additional HTTP requests posting the state of the current Web page to a Web server. The term form post indicates a Web client posting the state of a Web form to the Web server, but this is not the only use of form posts. A Web page can post to itself or another Web page. A postback occurs when a Web page posts to itself instead of posting to a different Web page.
For example, a Web page may contain a survey Web component comprising a question, a series of answers, and a button to submit the response to the survey. The survey Web component could indicate that when the user submits his/her response, the Web client should post the survey data to a different Web page (a form post) that returns survey results to render in the Web client, or the same Web page (a postback) that causes the Web client to render the same information. A Web analytics engine could track every HTTP request, recording two requests for the same news page in the case of a postback. The survey Web component could use AJAX to cause additional HTTP requests during the form post or postback.
Page Impression
A page impression is a single visit to a Web page. A page impression can contain multiple component impressions. When a user takes an action within a page that results in a postback, the page returned to the Web client may contain new component impressions, or the same component impressions, possibly containing data updated because of the postback.
Navigation
Navigation is what happens when the user requests a web page through the web client. For example, when a user opens a Web client, the Web client typically places an HTTP request for a Web page; the user has navigated to that Web page. When the user clicks on a link in that Web page, the Web client places another HTTP request for another Web page. Various other user actions can result in the Web client navigating to a Web page.
Users can navigate between Web pages in predictable and unpredictable ways. A user could access the home page of a Web site, and select a link from that page to another page within the Web site. A user could use a bookmark or enter a URL into their Web client to access a Web page directly. A user could select a link to a Web page in search results on the Web site or an Internet search engine.
While in most cases navigation refers to requesting different Web pages in sequence, in some cases a user may navigate to the same Web page twice in sequence. For example, a Web page may contain a link to itself, or another feature that when accessed, causes the Web client to place another primary HTTP request for the same Web page. A user would not typically navigate twice in sequence to the same Web page, but a user might navigate to the same page twice unintentionally, such as by clicking a link that they do not realize references the same Web page that they are already viewing. A Web analytics engine may report a postback as a user navigating twice to a single Web page.
JavaScript and AJAX
Web components can contain Web page programming code in Web scripting languages, most commonly JavaScript. Various system and user events, such as the Web client completing loading of a Web page or a user clicking on a Web component, can trigger JavaScript that controls the behavior of the Web client, including Asynchronous JavaScript and XML (AJAX). AJAX refers to using logic in the page to place HTTP requests and process responses instead of having the Web client load the response directly. A Web server or Web analytics server responds to HTTP requests from AJAX components with XML, code, or data in other formats. The Web component placing the AJAX HTTP request can process the data or execute the code returned from the Web server, such as to update the user interface shown on the Web page. Some Web clients do not support JavaScript, and it is possible for the user to disable JavaScript in others.
Action
An action represents a user activity within a Web component in a Web page. A user action raises an event.
Event
An event represents something that happened within a Web component in a Web page, such as the user action of clicking a button component, or a multimedia component reaching the end of a video. Events within a Web component can cause the Web client to place additional HTTP requests that do not cause the Web client to load a new Web page, but instead update a Web component in the current Web page, or in the case of Web analytics, record a user action. A single page impression can involve numerous actions and events, such as the action of a user clicking the play button in a multimedia component, and the event of that video reaching its conclusion.
Cookie
When a Web client requests a resource from a Web server, the response can include a cookie. The Web client includes that cookie in subsequent requests to the same Web server. Each cookie is associated with a domain (possibly including a subdomain), and the Web client only transmits the cookie when requesting a URL in the same domain that provided the cookie. A cookie can include various types of information, including a token that uniquely identifies the user of the Web client. Some Web clients do not support cookies, and it is possible for the user to prevent other Web clients from accepting cookies. But in reality most websites do not work when the user disables cookies. Web clients support session cookies, which disappear when the user closes the Web client, and permanent cookies, which persist across multiple sessions.
(If cookies are disabled there are other means of achieving the same functionality, such as injecting a cookie code into e.g. the domain name or in another part of the URL of all requests for that session).
Redirect
A Web server can respond to an HTTP request with a redirect, which forwards the Web client to another URL.
Referrer
A referrer is the URL of a Web page that contains a Web component such as a link that, when accessed, causes the Web client to place another primary HTTP request. Not all HTTP requests include a referrer, such as when the user opens their home page or activates a bookmark in the Web client.
Web Server Log
Most Web servers log each HTTP request to a log file or other data storage mechanism, and provide options to include various information about the request including the date and time, the requested URL, the Web client, the referrer, cookies, the response status such as success or error, and other details.
Web Analytics Server
A Web analytics server is a server that provides Web analytics functionality for collecting data for requests and performing analysis/reporting on that data. Some Web analytics servers obtain information by parsing Web server logs. Other Web analytics servers are themselves Web servers, but instead of serving content, use external requests embedded in primary Web pages to track those primary Web page requests. Other systems use a combination. Developers using such Web analytics solutions embed resources such as markup or script in primary Web pages to cause the Web client to place an additional external request. This external request often uses the URL of an image on the Web analytics server, including query string parameters containing data relevant to the Web analytics server, such as the URL of the primary request, and using cookies provided by the Web client. The Web analytics server uses this information to track the primary request, and may respond to that request with a cookie to support additional Web analytics tracking functionality, but that response includes no content to display as a Web component within the primary Web page. Other Web analytics servers may use AJAX or other technologies instead of using image URLs.
Most Web servers primarily serve content; Web analytics servers primarily track information about Web clients. Web analytics engines use HTTP requests not to request resources from Web servers, but to report information from a Web client. Most Web analytics servers track the sequence of pages visited by each user, but do not report every HTTP request, such as requests for images, style sheets and JavaScript include files. Web analytics servers can report AJAX requests if the site owner has programmed the Web site to send information from the AJAX function to the Web analytics server.
Web analytics servers commonly include a cookie in their responses to requests from Web clients in order to track individual users. Such analytics engines use permanent cookies, allowing the Web analytics system to track individual users through multiple sessions. A Web analytics server can track a user's actions on multiple Web sites that use a common Web analytics server. For example, if a user accesses both cnn.com and nytimes.com, then each of these sites cannot know that this user is accessing the other. If both sites use a Web analytics facility such as Google Analytics, then through the means of a cookie, Google Analytics will know what content this user has accessed on both sites. By using a persistent cookie, the analytics engine can track the user's actions across multiple sites in multiple browsing sessions.
Thereby Google Analytics becomes a global surveillance facility for tracking user navigation across Web sites and Web client sessions. From a privacy standpoint this is clearly undesirable to the person visiting the websites, not knowing that this is going on. And it defeats the purpose of the security built into the cookies, namely that a cookie was only intended to be readable for the domain by which it was issued.
One may expect that in the not too far away future, web browsers will safeguard the privacy of individuals by blocking out requests for e.g. Google Analytics and similar global surveillance services.
Application Programming Interface (API)
An Application Programming Interface, or API, defines conventions for a computer programming language by which developers accesses the features of a software system.
Web Content Management (WCM)
A Web Content Management system (WCM) provides software residing on one or more Web servers that facilitates WCM users updating the content of one or more Web sites. WCM systems comprise two logical components: a content delivery component, which responds to requests for content on Web sites, and a content management component, which provides facilities for developers and WCM users to build and manage the Web components and the content of the Web sites.
Content Management System (CMS)
A Content Management System (CMS) provides software that allows CMS users to maintain content, or data within an information system. While the term CMS can include Enterprise Content Management (ECM) and other implementations, for the purposes of this document, CMS and WCM are synonymous.
User Profile Properties
Many Web sites manage information about users, such as location, gender, age, areas of interest, and other. Collectively, these attributes are known as user profile properties.
Web Analytics Strategies
Existing Web analytics engines use the following primary strategies to record user actions on Web sites: parsing Web server logs, embedding code or resource requests in Web pages, and request processing.
Existing Web analytics solutions using both Web server log parsing and embedded external requests approaches provide several primary types of reports:                Aggregate data for a given time period (total number of page requests, number of requests for a specific page, most popular pages, etc.).        The sequence of primary HTTP requests from a single user.        A sequence of loosely defined actions and events, typically reported separately from the sequence of primary HTTP requests from a single user.Web Server Log Parsing        
One approach to capturing data for Web analytics engines is to parse Web server logs containing records of HTTP requests. Each time a Web client requests a resource, such as a Web page, from a Web server, the Web server logs various pieces of information about the request. A Web analytics engine parses these logs to generate reports. Logged information can include a timestamp, the requested URL, the IP address of the client, Web client details, and other information about the request and the Web client. Because logging occurs on the Web server, clients cannot subvert this approach to gathering Web analytics data, which provides relatively little detail about the user's actions other than the sequence of HTTP requests.
For example, a Web server logs a request from a Web client for the home page of a Web. The response includes references to a number of images, stylesheets, JavaScript, and other resources. The Web server may log each of these requests, though they provide little useful data for reporting purposes. The recorded URL of each request may contain query string parameters that the Web analytics server can use to determine additional information. When the user clicks a link in the Web page or causes the Web client to submit a post, the Web server logs the request, and can log requests for the resources referenced in the response.
All HTTP requests appear similar to the Web server. Other than the amount and specific information transmitted, there is little difference between a request for the home page, a request for an image or other resource referenced by the home page, and an AJAX request. The Web server logs all of these events in the same way, though it responds to these requests differently. It is the responsibility of the Web analytics engine to process the Web logs to generate meaningful reports.
Embedded External Requests
Another approach to capturing data for Web analytics is to cause the Web client to place external requests for tracking purposes. For example, a Web page could contain a reference to an image, either on the same Web server or another Web server that is also a Web analytics server. When the Web client requests the image, the Web analytics server logs or otherwise processes the request for Web analytics purposes. A Web page could also place external requests for JavaScript or other resources.
Events and JavaScript in Web components could also cause the Web client to send HTTP requests to a Web analytics server. For example, each page on a Web site could include JavaScript that executes whenever a Web client loads a page, or potentially while that page loads. This JavaScript causes the Web client to send an additional HTTP request to a Web analytics server either while the Web client is loading the page or after it completes loading the page. This HTTP request may include information such as the primary requested URL and information about the Web client. The Web server that receives the HTTP request updates a Web analytics database with details about the primary page request.
With AJAX, developers can use this approach to record actions other than page requests for Web analytics purposes. For example, in addition to associating Web analytics JavaScript logic with the Web client event that indicates that the page has loaded, a developer could associate Web analytics logic with the user's action of clicking on a Web component within a Web page. AJAX allows the developer to associate JavaScript to initiate HTTP requests to implement Web analytics functionality based on events. These solutions typically report actions separately from navigation.
Because users can disable JavaScript or use tools to block HTTP messages to Web analytics servers, users can circumvent this approach to gathering Web analytics data. Many solutions that embed Web analytics script in pages also rely on cookies, which users can also defeat.
The Web client industry, which includes Web browsers, is gaining increased focus on user privacy issues. Some of the new technologies being introduced allow users to browse Web sites while disabling all external requests to facilities such as Google Analytics, LeadLander, OmniTure, and other Web analytics and other solutions.
Today's Web analytics is essentially based on the paradigm that developers first build the Web site, and then adds the required code to enable Web analytics functionality afterwards.
Request Processing
Most modern Web servers respond to primary requests not with static content, but using a Web application server to execute logic to generate content dynamically. This logic can include calls to Web analytics APIs to pass information to the Web analytic server to track the primary request.