1. Field of the Invention
This invention relates to monitoring the display and observation of content by a computer system. The invention also relates to monitoring the display and observation at a content display site of content that is provided by a content provider site over a network to the content display site. The invention further relates to the provision of updated and/or tailored content from a content provider site to a content display site so that the content provider's current content is always displayed at the content display site.
2. Related Art
A large amount of human activity consists of the dissemination of information by information providers (content providers) to information consumers (observers). Recently, computer networks have become a very popular mechanism for accomplishing information dissemination. The use of computer networks for information dissemination has necessitated or enabled new techniques to accomplish particular functions related to the dissemination of information.
For example, information providers of all types have an interest in knowing the extent and nature of observation of the information that they disseminate. Information providers that disseminate information over computer networks also have this interest. However, the use of networked computers for information dissemination can make it difficult to ascertain who is observing the disseminated information and how, since information can be accessed rapidly from a remote location by any of a large number of possible observers whose identity is often not predictable beforehand, and since control over the display of the information once disseminated may not be possible, practical or desirable.
Among information providers, advertisers have particular interest in knowing how and to what extent their advertisements are displayed and/or observed, since such knowledge can be a key element in evaluating the effectiveness of their advertising and can also be the basis for payment for advertising. Mechanisms for obtaining such information have been developed for advertisements disseminated in conventional media, e.g., audiovisual media such as television and radio, and print media such as magazines and newspapers. For example, the well-known Nielsen television ratings enable advertisers to gauge the number of people that likely watched advertisements during a particular television program. As advertising over a computer network becomes more common, the importance of developing mechanisms for enabling advertisers to monitor the display and observation of their advertisements disseminated over a computer network increases.
Previous efforts to monitor the display of advertising (or other content) disseminated over a computer network have been inadequate for a variety of reasons, including the limited scope of the monitoring information obtained, the ambiguous nature of the monitoring information, the incompleteness of the monitoring information, and the susceptibility of the monitoring information to manipulation. Review of some of the techniques that have previously been used to acquire monitoring information regarding the display of content (e.g., advertising) disseminated over a particular computer network—the World Wide Web portion of the Internet computer network—will illustrate the deficiencies of existing techniques for monitoring the display of content disseminated over a computer network.
FIGS. 1A and 1B are simplified diagrams of a network illustrating operation of a previous system for monitoring requests for content over the World Wide Web. In FIGS. 1A and 1B, a content provider site 101 (which can be embodied by, for example, a server computer) can communicate with a content display site 102 (which can be embodied by, for example, a client computer) over the network communication line 103. The server computer at the content provider site 101 can store content colloquially referred to as a “Web page.” The client computer at the content display site 102 executes a software program, called a browser, that enables selection and display of a variety of Web pages stored at different content provider sites. When an observer at the content display site 102 wishes to view a particular Web page, the observer causes the client computer at the content display site 102 to send a request to the appropriate server computer, e.g., the server computer at the content provider site 101, as shown in FIG. 1A. The server computers at content provider sites all include a software program (in the current implementation of the World Wide Web, this is an http daemon) that watches for such incoming communications. Upon receipt of the request, the server computer at the content provider site 101 transfers a file representing the Web page (which, in the current implementation of the World Wide Web, is an html file) to the client computer at the content display site 102, as shown in FIG. 1B. This file can itself reference other files (that may be stored on the server computer at the content provider site 101 and/or on other server computers) that are also transferred to the content display site 102. The browser can use the transferred files to generate a display of the Web page on the client computer at the content display site 102. The http daemon, in addition to initiating the transfer of the appropriate file or files to the content display site 102, also makes a record of requests for files from the server computer on which the daemon resides. The record of such requests is stored on the server computer at the content provider site 101 in a file 104 that is often referred to as a “log file.”
The exact structure and content of log files can vary somewhat from server computer to server computer. However, generally, log files include a list of transactions that each represent a single file request. Each transaction includes multiple fields, each of which are used to store a predefined type of information about the file request. One of the fields can be used to store an identification of the file requested. Additional fields can be used to store the IP (Internet Protocol) address of the client computer that requested the particular file, the type of browser that requested the file, a time stamp for the request (i.e., the date and time that the request was received by the server computer), the amount of time required to transfer the requested file to the client computer, and the size of the file transferred. Other information about file requests can also be stored in a log file.
Previous methods for monitoring the display of content distributed over the World Wide Web have used the information stored in the log file. For example, one previous method has consisted of simply determining the number of transactions in the log file and counting each as a “hit” on a Web page, i.e., a request for a Web page The number of hits is deemed to approximate the number of times that the Web page has been viewed and, therefore, the degree of exposure of the content of the Web page to information consumers.
There are a number of problems with this approach however. For example, as indicated above, a request for a Web page may cause, in addition to the request for an initial html file, requests for other files that are necessary to generate the Web page. If these other files reside on the same server computer as the initial html file, additional transactions are recorded in the log file. Thus, a request for a single Web page can cause multiple transactions to be recorded in the log file. As can be appreciated, then, the number of times that a Web page is transferred to a content display site can be far less than the number of transactions recorded in the log file. Moreover, without further analysis, there is no way to accurately predict the relationship between the number of transactions in the log file and the number of times that a Web page has been transferred to the content display site. Such inaccuracy can be very important to, for example, advertisers—whose cost of advertising is often proportional to the measured exposure of the advertising—since the measured exposure of their advertising (and, thus, its cost) may be based upon the number of hits on a Web page containing their advertisement.
A method to overcome this problem has been used. By analyzing the contents of the log file to determine which file was requested in each transaction, it may be possible to differentiate transactions in which the initial html file needed to generate a Web page is requested from transactions in which the requested file is one which is itself requested by another file, thus enabling “redundant” transactions to be identified and eliminated from the hit count. While such an approach can increase the accuracy of counting Web page hits, it still suffers from several problems.
For example, log file analysis may result in some undercounting of Web page hits, apart from any overcounting. This is because, once transferred to a client computer at a content display site, the files necessary to generate a Web page can be stored (“cached”) on that client computer, thus enabling an observer at the content display site to view the Web page again without causing the client computer to make another request to the content provider server computer from which the Web page was initially retrieved. Consequently, the observer can view the Web page without causing transactions to be added to the log file, resulting in undercounting of the number of Web page hits.
Additionally, log files are subject to manipulation, either directly or indirectly. For example, an unscrupulous content provider could directly manipulate the log file by retrieving and editing the log file to add phony transactions, thus artificially increasing the number of Web page hits and making the Web page appear to be more popular than it really is. This problem can be ameliorated by causing the log files to be transferred periodically at predetermined times (e.g., each night at 12:00 midnight) from the server computer at the content provider site to a neutral network site; however, the log file can still be manipulated during the time between transfers.
A log file might be manipulated indirectly, for example, by programming one or more computers to continually request a Web page, thereby generating a large number of hits on that Web page. While the log file would contain transactions corresponding to actual file requests associated with the Web page, these requests would be artificial requests that would almost certainly not result in a display of the Web page, and certainly not in the observation of the Web page. Moreover, checking the contents of the log file for an unusually high number of requests from a particular IP address (i.e., client computer) may not enable such manipulation to be detected, since a large number of requests may legitimately come from a client computer that serves many users (for example, the proprietary network America Online™ has a handful of computers that are used by many users of that network to make connection to the Internet and World Wide Web).
It may be possible to identify the real origin of requests or content using “cookies.” A cookie enables assignment of a unique identifier to each computer from which requests really emanate by transferring the identifier to that computer with content transferred to that computer. Future requests for content carry this identifier with them. The identifier can be used, in particular, to aid in identification of indirect log file manipulation, as described above, and, more generally, to enable more robust log file analysis.
Notwithstanding such enhancement, cookies do not overcome a fundamental problem with the use and analysis of log files to ascertain information regarding the display of content provided over the World Wide Web. That is, as highlighted by the overcounting problem associated with the above-described artifice and the undercounting problem associated with caching of content at the content display site, log files only store information about file requests. A log file does not even indicate whether the requested file was actually transferred to the requesting client computer (though, typically, such file transfer would occur). Nor does a log file include any information about how the file was used once transferred to the requesting client computer. In particular, log files do not provide any information regarding whether the content represented by the requested file is actually displayed by the client computer at the content display site, much less information from which conclusions can be deduced regarding whether—and if so, how—the content was observed by an observer. These limitations associated with the content of a log file cannot be overcome by a monitoring approach based on log file analysis. Moreover, log file analysis is calculation intensive, requiring hours in some instances to extract the desired information from the log file.
Another method of monitoring the display of content disseminated over the World Wide Web uses an approach similar to that of the Nielsen ratings system used in monitoring television viewing. In this method, monitoring software is added to the browser implemented on the client computers of a selected number of defined observers (e.g., families) to enable acquisition of data regarding advertising exposure on those computers. This information is then used to project patterns over the general population.
However, this approach also has several disadvantages. First, only a limited amount of data is collected, i.e., data is only obtained regarding a small number of information consumers. As with any polling method, there is no guarantee that the data acquired can be extrapolated to the general population, even if the observers selected for monitoring are chosen carefully and according to accepted sampling practices. Second, as the size of the World Wide Web (or other computer network for which this method is used) grows, i.e., as the number of content provider sites increases, the number of monitored observers necessary to ensure accurate representation of the usage of all content provider sites must increase, since otherwise there may be few or no observer interactions with some content provider sites upon which to base projections. It may not be possible to find an adequate number of appropriate observers to participate in the monitoring process, particularly given concerns with the attendant intrusion into the privacy of the selected observers. Third, installation of the monitoring software on a client computer to be compatible with a browser presents a number of problems. Such installation requires active participation by observers; since observers typically do not reap benefit from operation of the monitoring software, they may be reluctant to expend the effort to effect installation. The monitoring software must continually be revised to be compatible with new browsers and new versions of old browsers. To enable monitoring of a large number of client computers, the software must be tested for compatibility with a wide variety of computing environments. And, as currently implemented, such monitoring software is also dependent upon the computing platform used, making it necessary to revise the monitoring software for use with new computing platforms or risk skewing the demographics of the sample users.
In addition to desiring information regarding the display and observation of the content that they provide, content providers also often desire to provide content to a content display site that is particularly tailored for observation (e.g., according to various demographic characteristics of an expected observer) at that content display site. For example, text content should be expressed in a language that the observer can understand. If appropriate for the content, it is desirable to tailor the content according to, for example, the age, sex or occupation of the observer.
Such tailoring of content has previously been enabled by modifying the http daemon on a computer at the content provider site to cause a particular version of a set of content to be transferred to a requesting content display site based upon the IP address of that content display site. While such tailoring of content is useful, it is desirable to be able to tailor the presentation of content in additional ways not enabled by this approach.
Content providers also often desire to provide their content with the content of other content providers. For example, it is a common practice for content providers (referred to here as “primary content providers”) on the World Wide Web to include advertisements from other entities (referred to here as “secondary content providers”) as part of the content provider's Web page. In such situations, it is desirable for the secondary content provider to be able to easily update and/or appropriately tailor (e.g., according to characteristics of the requester) the content that they supply to the primary content provider. This could be accomplished by causing the primary content provider site to contact the secondary content provider site—each time that the primary content provider receives a request for content that includes the secondary content—to retrieve the secondary content (thus ensuring that updated, appropriately tailored secondary content is used) or check whether updated or tailored secondary content is available (if so, the content is retrieved). (This method could also be modified so that content retrieval or a check for updated and/or tailored content is only performed according to a predetermined schedule.) However, both the primary content provider and the secondary content provider may not want their systems burdened with the extra computational capacity required to handle the multitude of requests that would be needed to effect this operation. Alternatively, the primary content provider could collect and store the updated and tailored content from the secondary content providers at the primary content provider site. However, the burden associated with collecting and managing the content from secondary content providers may be more than the primary content provider wants to shoulder.
One way that this functionality can be achieved without creating an undesirable burden on the primary or secondary content providing systems is by providing a secondary content storage site that can continually store the most recent content provided by a secondary content provider, as well as different sets of content tailored for particular situations (e.g. display by particular observers or at particular times). FIGS. 2A through 2D are simplified diagrams of a network illustrating the operation of such a system. In FIG. 2A, a content display site 202 makes a request over the network communication line 203 to the primary content provider site 201 for content that includes the secondary content. In FIG. 2B, the primary content provider site 201 transfers the file or files stored at the primary content provider site 201 that are necessary to generate a display of the primary content. These files include appropriate reference to a file or files stored at a secondary content storage site 204 that includes the most updated and/or appropriately tailored secondary content for display with the primary content. As shown in FIG. 2C, this reference causes the content display site 202 to request the secondary content from the secondary content storage site 204. In FIG. 2D, the secondary content is transferred from the secondary content storage site 204 to the content display site 202 for display at the content display site 202.
However, while this system can relieve the primary content provider of the burden of managing the acquisition, storage and provision of secondary content (a burden that can become rather onerous when many secondary content providers are providing content to the primary content provider), the system has a characteristic that can make it undesirable for many content providers. The secondary content storage site not only manages the secondary content, it also provides the secondary content when requests for primary content are made to the primary content provider. Moreover, the secondary content is frequently content, such as graphics files used to generate visual images (which frequently dominate advertisements), that has a high bandwidth requirement for transmission over the network. By taking control of the transmission of secondary content to the content display site, the secondary content storage site is also frequently taking control of the most bandwidth sensitive parts of the content provided by the primary content providers. The operator of the secondary content storage site may not provide a system that addresses the bandwidth requirements to the satisfaction of the primary content provider, so that the presentation of the combined primary and secondary content occurs more slowly than desired by the primary content provider. Thus, this approach causes the primary content provider to lose control of a critical aspect of their operation.