Not applicable (no prior applications for this invention)
References to related patents and publications:
U.S. Patent Documents
U.S. Pat. No. 5,732,218 March 1998 Bland, et al. 395/200.54
U.S. Pat. No. 5,812,776 September 1998 Gifford 395/200.47
U.S. Pat. No. 5,812,769 September 1998 Graber et al. 395/200.12
U.S. Pat. No. 5,751,956 May 1998 Kirsch 395/200.33
U.S. Pat. No. 5,708,780 January 1998 Levergood, et al. 395/200.59
Foreign Patent Documents
WO9826571 June 1998 WIPO H04M15/00
WO9827502 June 1998 WIPO G06F19/00
Other bibliographic references
Pitkow. In Search of Reliable Usage Data on the WWW. Proceedings of the International WWW Conference, 1997.
Naor and Pinkas. Secure Efficient Metering. Proceedings of Eurocrypt, 1998.
Franklin and Malkhi. Auditable Metering with Lightweight Security. Proceedings of the Financial Cryptography Workshop, 1997.
Krawczyk, Bellare and Canetti. HMAC: Keyed Hashing for Message Authentication, IETF-RFC 2104, 1997.
Berners-Lee, et al. Hypertext Transfer Protocolxe2x80x94HTTP/1.0, IETF-RFC 1945, 1996.
Berners-Lee, et al. Hypertext Transfer Protocolxe2x80x94HTTP/1.1, IETF-RFC 2068, 1997.
Anderson and Kuhn. Tamper Resistancexe2x80x94a Cautionary Note. Proceedings of the Usenix Workshop on Electronic Commerce, 1996.
M. K. Reiter, V. Anupam and A. Mayer. Detecting Hit-shaving in Click-through Payment Schemes. Third Usenix Workshop on Electronic Commerce, Boston, 1998.
Not applicable
Not applicable
This invention relates to client-server computer communication systems. Specifically, this invention considers systems in which relevant data about communications between a server and many clients is logged and certified. Still more particularly, a preferred embodiment of this invention relates to systems on the World Wide Web (WWW) where a third party agency audits and certifies access statistics of an HTTP server receiving requests from a plurality of HTTP clients.
The Internet is currently the largest computer network, counting several millions of interconnected computers that exchange information using the widespread TCP/IP Protocol. The Internet""s most popular information system is the World Wide Web (WWW, the Web). The Web is a client-server distributed hypermedia system, based on the HTTP protocol. On-line Web advertisement represents a significant portion of Internet-related revenues. However, no reliable and widespread system for metering Web site accesses has emerged. As a consequence, a few well-known sites, such as the main news sites and the most effective search engines dominate the market, because their popularity is taken as a given fact. For many sites of medium to high popularity, there is a great potential for increasing advertisement profits, but investors need third party evaluation of purported access rates and characteristics.
Since it is reasonable to expect that advertisement profits grow with the site""s popularity, there are valid reasons for the Web site""s institution to forge metering data. Forging the file that stores access data is not the only method to obtain false metering information. In the WWW setting, for example, there exists a technique called xe2x80x9cIP spoofingxe2x80x9d that allows an HTTP client, under specific circumstances, to masquerade as another client, with a different IP address. This method could be used to produce many fictitious requests addressed to an HTTP server, which in turn would record an increased number of hits. For a third party to guarantee the correctness of Web server access statistics, it is necessary to provide an apparatus that avoids metering data falsification by the Web site""s institution. For obvious reasons of standardization, this goal should be reached without the need of modifying the client, and with as few modifications as possible on the server side.
Much work has been devoted to the problem of understanding, summarizing and correcting access logs. The paper by Pitkow, published in the Proc. of the Int. WWW Conference, 1997, provides a good survey, with special reference to the use of proxies. U.S. Pat. No. 5,732,218 is also related to the issue of gathering Web site statistics, using both server-side and client-side arrangements. A number of commercial products for analyzing Web server logs are available. Most of the above work, however, does not deal with the intentional falsification by the organization running the server. For this problem, which is the subject of the present invention, two recent studies are of interest.
The work by Naor and Pinkas, on xe2x80x9cSecure Efficient Meteringxe2x80x9d (published in the Proc. of Eurocrypt 1998), is based on secret sharing among clients, and transmission of secret shares from client to server, so that that the server may prove its purported hit rate by reconstructing the original secret. This requires a special initialization of clients that is not practically feasible in a Web setting. The same problem may be observed in some embodiments of U.S. Pat. No. 5,732,218.
The work by Franklin and Malkhi on xe2x80x9cAuditable Metering with Lightweight Securityxe2x80x9d (published in the Proc. of the Financial Cryptography Workshop, 1997), uses a timing scheme to raise the cost of false client requests, so that it becomes uninteresting to forge a high number of hits. This was also granted patent WIPO WO9826571 in 1998. The technique by Franklin and Malkhi requires normal clients to perform extra computations just for the purpose of auditing, and this may be unattractive for commercial servers.
The present invention allows third parties to verify hits in a secure way, without changing clients in any way, and with a very minor overhead. The invention requires that the protocol used by clients and servers support any form of redirection or referrals; in particular, it may be applied to the HTTP protocol, which allows for redirection. The redirection capabilities of HTTP were used in previous US patents. U.S. Pat. No. 5,812,776 uses HTTP redirection to allow users to identify a resource by means of a traditional locator, such as a telephone numberxe2x80x94a first server will then map this locator to an IP address and redirect the client automatically to the actual resource available from a second server. Server-to-server interactions related to redirection have been used for access control in U.S. Pat. No. 5,708,780 and in WIPO patent WO9827502.
Redirection was used in U.S. Pat. Nos. 5,751,956 and 5,812,769 for tracking client requests, a goal that is related to the present invention as well as to WIPO patent WO9826571, discussed above. The same problem of tracking client request paths is studied in the work by Reiter et al., published in the proceedings of the Third Usenix Workshop on Electronic Commerce, 1998. These studies consider a situation where a user visits a first Web site, and then uses a hyperlink on said first Web site to connect to a second Web site. For example, said first Web site could be a search engine server, and the second Web site could be an electronic commerce facility advertised on said first Web site. Both sites are interested in keeping track of the user""s action to monitor advertisement effectiveness and agree on corresponding fees. Redirection and specific security measures are suggested in the above studies to make user tracking possible and reliable. However, redirection is used heavily, and may slow client responsiveness in a significant way. Moreover, the above studies and patents do not solve the general problem of veriyfing the number and the characteristics of client connections received by a server.
The present invention provides a viable and general solution for auditing Web site popularity. The solution is very efficient, because it uses redirection with a small probability in order to cause external references of client accesses to an HTTP server. In a preferred embodiment, redirection can be limited to a very small percentage of client requests, because tamper-evident hardware is installed at the Web site location and occurrences of redirection cannot be predicted by the organization running the Web site.
The present invention addresses the problem of preventing the falsification of server access logs, either by direct modification of log files or by issuing artificial requests with spoofed addresses. One of the components of the invention that will be disclosed is an authentication device, that can exchange information with the application server, and such that undetected opening or modification is impossible. This can be achieved with tamper-evident hardware, or by placing the device in a physically protected site.
The certification agency will have its own trusted server. It should not reside on the same Local Area Network (LAN) and premises as the server to be certified. It should be controlled and protected by the certification agency. Clients can be anywhere on the network and should have access to both the server to be certified and the server controlled by the certification agency.
When a client requests a document, it uses the relevant application protocol to send its request, including, e.g., the path of the file to be retrieved or any other kind of resource locator. Once this request arrives at the server, a corresponding log line is passed to the authentication device. This line contains everything that needs to be certified: for example the requested document, a time stamp, and the client address.
For each such log line, the authentication device appends a sequence number and an internally generated random bit B, and uses internal secret keys to compute a corresponding Message Authentication Code (MAC). MACs should be computed efficiently and must be impossible to forge without knowledge of the relevant keys. A digital signature could also be used, but is less efficient. After calculating the MAC, the authentication device sends it to the server along with the generated random bit B. The server stores the MAC, the bit B, and the log line on an accessible medium.
If B=0, the server handles the client request in a normal fashion, according to the application protocol specification. If B=1, the server redirects the client request to the server of the audit agency. The bit B should be generated in such a way that B=1 with low probability (e.g., 0.001 or less), so as to make the system as efficient as is necessary.
If B=1 and if the client request is authentic, the server of the audit agency receives a request from the redirected client. The request is logged for later comparisons and the client is redirected again to the original site. After a predefined audit period, the institution running the server sends its log files and the corresponding MACs to the certification agency for verification purposes. The certification agency will accept these log files if all the lines marked by the authentication device correspond to an associated client record at the agency""s site. Moreover, every computed MAC for a certain line must be correct and sequence numbers assigned to lines must be consecutive. If this is not the case in a repeated number of occurrences, certification could be denied, based on the agency""s policy. When certification is granted for a particular log file, derivative works such as statistics will be certified, as well.
It will be evident to those skilled in the art that the bit B must be unpredictable to anyone but the certification agency, otherwise log lines where B=0 could be inserted or modified without detection. Consequently, the device that generates this bit and the corresponding MACs must either be tamper-evident or physically protected. In a preferred embodiment, a tamper-evident device is directly connected to the server, for increased performance.
In summary, the present invention prevents undetected falsification of server access logs. It may be applied to any protocol supporting the xe2x80x9credirectxe2x80x9d concept, including HTTP.