The invention relates to a method of monitoring the use of information provided over a computer network, and to computers for implementing the method.
A network is a means of communicating between two or more computers or processors, and can take many forms, including for example the internet, infra-red, radio signals and cabling. Any medium for transporting information from one computer to another can be regarded as a network.
Most of the existing technology in this field is based in the area of internet or intranet Web servers. The invention will therefore be described in the context of server logs as used in this area, but it is applicable across many other areas of client/server interaction. Another particular area of potential interest is in relation to e-mail.
Currently, when a computer which provides information (i.e. server) fills an information request, it writes to a file at the server (a server log) whatever information it has about the request. Typically this will include the following details:
(a) what time the request reached the server;
(b) what information was requested;
(c) where the information was sent to; and
(d) how the user was referred to the information (i.e. the referrer).
Some servers, or programs designed to run in conjunction with servers, will also send (e) an item of information to identify the recipient, and will record this if it is subsequently included with any future information requests. There are currently many log analysers on the market which take the details written by the server and try to form a picture of what information users have been looking at, when, and for how long.
Since servers only see the client requests, the information on timing that a server log analyser provides can be inaccurate. The time spent by the user examining the information can only be estimated by looking at the difference in time between when one request was made by a user and when the next request was made by the same user. If a user stops viewing the information sent, and later returns to it to make another information request, the time spent elsewhere may be included in the estimate for time spent viewing the sent information. This can be extremely inaccurate, to the extent of reporting hours spent viewing the requested information when only seconds were actually spent.
Another problem with timing is that over large networks such as the internet, particularly when accessed over low bandwidth connections, the time that the information takes to move from one computer to another can also become significant, and cannot be measured by server log analysis.
When copies of the information sent by the server are stored on the client computer and then viewed off-line, the server is not contacted and so no record of this viewing of the copies is kept. If either the server or the client is not connected to the network, then nothing will be seen using log analysis even if the computers are later reconnected across the network. This is another major deficiency in current methods.
With a Web server accessed by a standard browser, cached pages will often be accessed whenever the xe2x80x98back buttonxe2x80x99 or xe2x80x98forward buttonxe2x80x99 are used. No new request is sent to the server. Depending on individual settings, they may also be used for any page that is revisited on a Web site. Other different things may also cause caching, but in all cases it can lead to inaccurate timing reports being produced from server logs.
A proxy server is a server which takes a copy of information from a content server when it is first requested, and then passes it on to each client that requests it within a limited time period. Subsequent copies of any information passed through a proxy server do not involve any interaction with the content server and are therefore not recorded by the content server.
For all these reasons current methods of server log analysis are liable to be inaccurate and unreliable. We have appreciated that there is a need to provide a structure which can be used to provide more effective analysis of server usage, by overcoming at least the major ones of these problems.
The problem of proxy servers has been recognised by MatchLogic Inc., of 10333 Church Ranch Boulevard, Westminster, Colo. 80021, United States of America, which has produced a TrueCount system with a view to ameliorating the inaccuracies in counting resulting from proxy server use. In this system a small element of code is added to the header on the content pages to be counted. If the pages are cached on a proxy server, then whenever the proxy server delivers the stored content to a subsequent user, this added code element acts as a messenger and transmits a message to a special server set up to receive these messages. This however, is only a limited solution to the problems enumerated above and does not enable the other difficulties to be overcome. In particular, the system can take no account of off-line viewing, and indeed has no need to, as its intended purpose is to determine by how many users a page, or more likely an advertising banner, has been accessed. It can not give information on the length of time a page was viewed, whether on or off line, or any information of a more complex nature.
International Patent Application Publication No. WO98/10349 describes a system for monitoring the display by a user of content (e.g. advertisements) received from a server. The system is designed inter alia to make it difficult for the content provider to manipulate the log file at the server, by setting up user computers automatically to access but not display the content, and to avoid undercounting of cached pages. This is achieved by monitoring at the user computer the display of web pages, rather than just requests for pages. When a page is requested, a program is transmitted to the user which causes the user computer to determine which part of the content is being displayed on the user""s screen, and to note either the number of times the content is displayed or the start and finish times of such display. This information is then transmitted back to the content provider or to another location where the monitoring information is analysed.
There are a number of problems with this system. First, it still does not disclose how to handle the viewing of pages off-line. Secondly, it is limited to the monitoring of display, which is complex and may itself be inaccurate and not properly represent the effectiveness of the content, e.g. the advertisements, being displayed. Finally, there is no effective way of both ensuring that the monitoring data reaches the location where the monitoring information is analysed and also avoiding data being stored for long and/or indefinite periods at the user computer.
Reference may also be made to International Patent Application publication No. WO97/41673.
It is well-known for web servers to send xe2x80x98cookiesxe2x80x99 to user computers which are stored on the user computer and provide information about the user to the server.
The invention is defined in the independent claims below, to which reference should now be made. Advantageous features of the invention are set forth in the appendant claims.
In a preferred embodiment of the invention, described in more detail below, a provider or sender computer transmits to a requester or receiver computer code which causes the requestor computer to monitor each time a page is accessed or displayed, whether on-line or off-line, and to generate a log of such usage. The log includes events which occur not only when the requester computer is on-line to the provider computer but also events which occur off-line. When a subsequent request for information is made by the requester of the provider, the logged information is returned to or accessed by the provider computer, where it can be analysed.
The provider computer also accompanies each transmitted page with a version stamp, for example comprising date/time code. The requestor computer stores the latest received version stamp. On each occurrence of a specified event or events, it compares the version stamp in that page with the stored version stamp. If the stored version stamp is older than the version stamp of the displayed page, then it knows that the new page is being received in response to a request it has sent to the provider computer for information, and is not a stored page. Thus it knows that its logged information on usage, which will have been transmitted with that request for information, has reached the provider computer, and thus it also knows that it can clear the log, or at least it knows what information in the log is redundant.
Similar operations can be applied to the sending and receipt of e-mail messages, as described below.