The majority of Internet revenues come from connectivity and advertisement fees, yet there are almost no means to secure the accounting processes, which determine these fees from fraudulent behavior, e.g. a method to provide reliable usage information regarding a Web site. There is an enormous financial incentive for the Web site to inflate this data, and therefore measurement methods should be secure against malicious behavior of the site. Measurement methods which are based on sampling are relatively protected from corrupt behavior of Web sites but do not provide meaningful data about small and medium scale sites.
There has been a considerable amount of work on securing online payments. However most of the revenues from Internet ventures do not come from direct sales: the largest sums of money are by far those paid for advertising and for connectivity to the Internet. There are many different forecasts for the future distribution of Internet revenues but many of them agree that advertising and connectivity will remain the major sources of income from the Internet. In light of these figures it is surprising how little research has been conducted towards securing the accounting mechanisms that are used by advertising and connectivity providers.
Most of the revenues of Web sites come from advertisement fees. Although there are different forecasts for the market share of online advertising, the estimations are that very large sums of money will be invested in this media. Like in every other advertising channel, Web advertisers must have a way to measure the effect of their ads, and this data affects the fees that are charged for displaying ads. Advertisers must therefore obtain accurate and impartial usage statistics about Web sites and about page requests for pages that contain their ads. Web sites on the other hand have an obvious motivation to inflate their usage reports in order to demand more for displaying ads.
In the pre-Web world there were two main methods for measuring the popularity of mediate channels, sampling and auditing. Sampling, like the Nielsen rating system for TV programs, is survey based. It picks a representing group of users, checks their usage patterns and derives usage statistics about all the users. In traditional types of media like television this method makes sense since users have a relatively limited number of viewing options to choose from. These types of media use broadcast, which operates in a one-to-many communication model. The Web operates in a many-to-many communication model and offers millions of Web pages to visit. Therefore although sampling based metering services are offered for the Internet, they do not provide meaningful results for any but the most popular Web sites.
Auditing is performed by trusted third party agencies, like the Audit Bureau of Circulations (ABC) which audits print circulation. Although the sites often offer such information regarding Web sites themselves, it should be taken with a grain of salt. The Coalition for Advertising Supported Information and Entertainment (CASIE) states in its guidelines for interactive media audience measurement that "Third party measurement is the foundation for advertiser confidence in information. It is the measurement practice of all other advertiser-supported media". There are a number of companies (like Nielson/IPRO, NetCount, etc.) which offer third party based audit services for the Internet. They typically install some monitoring software at the server that operates the site. However, the reliability of such audit data depends on the site providing accurate data or not breaking into the monitoring module. Sites have a huge financial interest to exaggerate their popularity. The lesson learnt from software and pay-TV piracy is that such financial interests lead to corrupt behavior that overcomes any "lightweight security" mechanism.
Today most Web advertising is displayed on a very small number of top popularity Web sites, like "Yahoo!" or CNN. It may be plausible that in spite of the great financial motivation such established sites will not provide inflated usage reports or break into audit modules that report their activities.
However, while this may be true for the big sites, a large amount of advertising is displayed on smaller scale sites. It can also be argued that one of the main reasons that drive advertisers to use only the biggest sites is the lack of reliable audit data on smaller scale sites. The Web is so attractive because one can set a site of interest to perhaps only 10,000 users worldwide. This number may suffice to attract some advertisers, provided there are reliable usage statistics.
Advertisers can learn about the exposure of their ads by counting "click throughs", i.e. the number of users who clicked on ads in order to visit the advertiser's site. "Doubleclick" reported in 1996 that 4% of the visitors who view an ad for the first time actually click on it. This ratio changes according to the content of the ad, and therefore gives very limited information to the advertiser. Another method that advertisers can use is to display the ads form their own server (even when they are displayed in other sites) and eliminate the risk of unreliable reports from sites. However, this method burdens the advertiser with sending its ads to all their viewers and prevents the distribution of this task. The original communication pattern is not preserved since a new channel (between the advertiser and the client) is used. The load on the advertiser's server is huge and is surely not acceptable for a one-time advertiser. This solution is non-scalable, introduces a single point of failure (the advertiser), and is also insecure against "fake" requests created by the site displaying the ads.
Currently thereof no single accepted standard or terminology for Web measurement. Novak and Hoffman argue that standardization is a crucial first step in the way for obtaining successful commercial use of the Internet. They also claim that interactivity metrics rather than the number of hits or the number of visitors should be used to meter a site's popularity. The method of the present invention is defined to count the number of visits that a Web site receives. For purposes of presenting a general embodiment of the method of the present invention, this definition does not need to define a visit precisely. For example, it can be set to be a click, a user, a session of more than some threshold of time or of page requests from a single user; or any similar definition. The main requirement is that the measurement be universal to all clients and can be consolidated (for instance, a detailed report of the path of pages that each client went through in its visit cannot be consolidated into a single result. The number of clients whose visit lasted more than 15 minutes can be represented as a single number). The emphasis in this paper is in obtaining reliable usage statistics even when servers may try to act maliciously, and not in defining the type of statistics that are needed.
Pitkow discussed the problems caused by caching and by proxy usage, which hide usage information from Web servers. Possible solutions like temporal analysis, cache busting, and sampling were suggested.
Franklin and Malkhi were the first to consider the metering problem in a rigorous approach. Yet their solutions only offer "lightweight security"; clients can refrain from helping servers count their visits, servers can improve their count, and the variance of the measurement is relatively high. Such solutions cannot be applied if there are strong commercial interests to falsify the metering results.
Micropayments are an alternative method for financing online services. Their implementations are designed to be very efficient in order for their overhead to be less than the value of the transactions. Micropayments can be used for web metering, where each visit would require the client to send a small sum of "money" to the server, which would prove many visits by showing that is earned a large sum of money. However, all the current suggestions for micropayment schemes require the communication from the merchant (i.e. the server) to the bank (i.e. the audit-agency) to be of the same order as the number of payments that the merchant received. This means that the amount of information that the audit-agency receives is of the order of the total number of visits to all the metered servers. The method of the present invention is a more efficient metering scheme since there is no need to deduct "money" for clients' accounts.
The Internet is based on packet switching, i.e. there is no dedicated path between two parties that are communicating through the Internet, but rather each packet of information is routed separately. The Internet is essentially a network of networks and packets are typically routed through several different networks. These properties complicate pricing and accounting mechanisms for Internet usage, and indeed the most common pricing method is to charge a fixed price which is independent of the actual number of packets which are transferred. Pricing theory based analysis indicates that pricing Internet services according to the actual usage (at least at times of network congestion) is superior in terms of network efficiency. Usage based pricing has a disadvantage of incurring accounting and billing costs. It is impractical to create detailed account reports (similar to telephone accounts) due to the huge number of packets. Some are suggesting measuring usage using sampling or only at times of congestion (however, even producing reports for a sample of say, 1/1000 of the packets creates inconceivably large reports). MacKie-Mason and Varian also expect breakthroughs in the area of in-line distributed accounting that will lower the costs of Internet accounting.
A problem, which needs to be addressed, is the notion of secure and efficient metering of the amount of service requested from servers by clients, in Web applications and the like. Such metering methods should be realized without substantial changes to the operation of clients and servers (though they may require a change in the clients software and a registration process) and to their communication patterns.
References
Aho A., Hopcroft J. and Ullman J., The design and analysis of computer algorithms, Addison-Wesley, 1974.
Ben-Or M., Goldwasser S. and Wigderson A., Completeness theorems for noncryptographic fault tolerant distributed computation, 20th STOC, 1988, 1-9.
Biham, E. and Shamir, A., Differential fault analysis of secret key cryptosystems, in: Crypto '97, Springer-Verlag LNCS 1294, pp. 513-525.
Carter L. and Wegman M., Universal hash functions, J. of Computer and System Sciences, Vol. 18, 1979, 143-154.
Claffy, K., Braun, H. -W. and Polyzos, G., Applications of sampling methodologies to wide-area network traffic characterization, TR CS93-275, UCSD, 1993.
Coalition for advertising supported information and entertainment, CASIE guiding principles of interactive media audience measurement, April 1997, available at http://www.commercepark.com/AAAA/casie/gp/guiding principles.html.
Desmedt Y. and Frankel Y., Threshold cryptosystems, Crypto '89, LNCS 435, 1990, 307-315.
Diffie, W. and Hellman, M. E., New directions in cryptography, in: IEEE Trans. on Information Theory, November 1976, pp. 644-654.
Dwork C. and Naor M., Pricing via Processing or Combating Junk Mail, Crypto '92, LNCS 576, 1992, 114-128.
Estrin, D. and Zhang, L., Design considerations for usage accounting and feedback in Internet-works, ACM Computer Communications Review, 20(5):56-66, 1990.
Fang, W., Building an accounting infrastructure for the Internet, in: IEEE Global Internet, 1996, available at http://www.cs.princeton.edu/.about.wfang/Research/revised.ps.
Feldman P., A practical scheme for non-interactive verifiable secret sharing, 28th FOCS, 1987, 427-437.
Feldman P. and Micali S., An Optimal Probabilistic Protocol for Synchronous Byzantine Agreement, SIAM J. on Comp., Vol. 26, No. 4, 1997, 873-933.
Frankel Y., Gemmell P., MacKenzie P. D. and Yung M., Optimal-resilience proactive public-key cryptosystems, 38th FOCS, 1997, 384-393.
Franklin M. K. and Malkhi D., Auditable metering with lightweight security, Financial Cryptography '97, 1997.
Gupta, A., Stahl, D. O. and Whinston, A. B., Pricing of services on the Internet, in: F. Phillips and W. Cooper (Eds.), IMPACT: How ICC Research Affects Public Policy and Business Markets. Greenwood Pub, 1994.
J. Kilian, Founding Cryptography on Oblivious Transfer, 20th STOC, 1988, 20-31.
Jarecki S. and Odlyzko A., An efficient micropayment system based on probabilistic polling, Financial Cryptography '97, 1997.
Lesk, M., Projections for making money on the Web, in: Harvart Infrastructure Conference, Jan. 23-25, 1997, available at http://community.bellcore.com/lesk/iih/iih.html
MacKie-Mason, J. K. and Varian, H. R., Pricing the Internet, in: B. Kahin and J. Keller (Eds.), Public Access to the Internet. Prentice-Hall, 1994.
Merkle R., A certified digital signature, Crypto '89, LNCS 435, 1990, 218-238.
McCormac, J., European Scrambling Systems 5, Waterford University Press, Waterford, 1996.
McEliece, R. J. and Sarwate, D. V., On sharing secretes and Reed-Solomon codes, Comm. ACM, 24(9): 583-584, September 1981.
Murphy, I. P., On-line ads effective? Who knows for sure?, Marketing News, 30(20): 1-38, September 23, 1996.
Naor, M., and Pinkas, B., Secure and Efficient Metering, Advances in Cryptology--Eurocrypt '98, Springer-Verlag, 1998.
Novak T. and Hoffman D., New metrics for web media: toward the development of web measurement standards, September 1996. Manuscript available at http://www2000.ogsm.vanderbilt.edu/novak/web.standards/webstand.html
Pedersen T. P., Non-interactive and information-theoretic secure verifiable secret sharing, Crypto '91, LNCS 576, 1991, 129-140.
Pitkow, J., In search of reliable usage statistics on the WWW, in: Proc. of the 6th International WWW Conf., 1997, available at http://www6.nttlabs.com/HyperNews/get/PAPER126.html
Rabin T. and Ben-Or M., Verifiable secret sharing and multiparty protocols with honest majority, 21st STOC, 1989, 73-85.
Kinsman M., Web advertising 1997: market analysis and forecast, Cowles/Simba Information, Stamford, Conn. May 1997.
Shamir A., How to share a secret, Comm. ACM Vol. 22, No. 11, 1979, 612-613.
Wegman M. and Carter L., New hash functions and their use in authentication and set equality, J. of Computer and System Sciences, vol. 20, 1981, 265-279.
Yao A. C., How to generate and exchange secretes, 27th FOCS, 1986, 162-167.