A primary application of the present invention is to prevent/reduce spam, and therefore the following discussion of the background of the invention will focus on spam and methods of filtering it. (Other applications of the present invention include but are not limited to controlling account creations; controlling addurl at index/directory services (for adding a URL to an index or directory); controlling index mining; load control on free services such as stock quotes; preventing connection depletion attacks whereby a client requests access to a server in a malicious manner; and in strengthening passwords.) In addition, approaches such as digital postage, proof of work, and the like will be described. The Camram/Hashcash system is summarized at some length to set the stage for the subsequent description of the present invention. Further information about the approaches discussed below may be found in the references and at the websites (URLs) cited below. Paper copies of information relating to the Internet-based references cited below are being submitted herewith in an Information Disclosure Statement.
A. Spam
Electronic messaging, particularly electronic mail (e-mail) carried over the Internet, has become a preferred method of communication for many individuals and organizations. Unfortunately, e-mail recipients are increasingly being subjected to unsolicited and unwanted mass mailings. With the growth of Internet-based commerce, a wide and growing variety of electronic merchandisers is repeatedly sending unsolicited mail advertising their products and services to an ever expanding universe of e-mail recipients. For example, users of the Internet who merely provide their e-mail addresses in response to perhaps innocuous appearing requests for visitor information generated by various web sites, often find, later upon receipt of unsolicited mail and much to their displeasure, that they have been included on electronic distribution lists. This can have a negative effect on the users' experience and can diminish the productivity of users who receive such unwanted e-mail, or spam, at their place of business.
Once a recipient finds himself on an electronic mailing list, that individual can not readily, if at all, remove his address from it, thus effectively guaranteeing that he will continue to receive unsolicited mail. This occurs simply because the sender either prevents a recipient of a message from identifying the sender of that message (such as by sending mail through a proxy server) and hence precludes that recipient from contacting the sender in an attempt to be excluded from a distribution list, or simply ignores any request previously received from the recipient to be so excluded.
An individual can easily receive hundreds of pieces of unsolicited postal mail over the course of a year, or less. By contrast, given the extreme ease and insignificant cost through which electronic distribution lists can be readily exchanged and e-mail messages disseminated across extremely large numbers of addressees, a single e-mail addressee included on several distribution lists can expect to receive a considerably larger number of unsolicited messages over a much shorter period of time. Furthermore, while many unsolicited e-mail messages are benign, others, such as pornographic, inflammatory and abusive material, are highly offensive to their recipients. All such unsolicited messages collectively constitute so-called “junk” mail or “spam”.
A rule based textual e-mail classifier, specifically one involving learned keyword-spotting rules, is described in W. W. Cohen, “Learning Rules that Classify E-mail,” 1996 AAAI Spring Symposium on Machine Learning in Information Access, 1996. In this approach, a set of e-mail messages previously classified into different categories is provided as input to the system. Rules are then learned from this set in order to classify incoming e-mail messages into the various categories. At first blush, one might think to use a rule-based classifier to detect spam. Unfortunately, if one were to do so, the result could be quite problematic and disappointing. Rule-based classifiers suffer various deficiencies that, in practice, can severely limit their use in spam detection. First, spam detection systems typically require the user to manually construct appropriate rules to distinguish between interesting mail and spam. This is impractical for many e-mail recipients. Second, the characteristics of spam and non-spam e-mail may change significantly over time whereas rule-based classifiers are static (unless the user is constantly willing to make changes to the rules). In this regard, it is well known that mass e-mail senders (“spammers”) routinely modify the content of their messages in a continual attempt to prevent recipients from setting up a filter to reject them.
Therefore, a need exists in the art for a different approach to reducing the amount of spam delivered to a recipient's in-box. One such approach involves the use of so-called proof of work postage stamps and client puzzles. These are discussed next.
B. Digital Postage, Proof of Work, Client Puzzles, etc.
Glassman, et al., have described the Millicent protocol for inexpensive electronic commerce. World Wide Web Journal, Fourth International World Wide Web Conference Proceedings, pages 603–618. O'Reilly, December 1995. See also, http://www.w3.org/Conferences/WWW4/Papers/246/. Briefly, Millicent is a secure protocol for e-commerce over the Internet, designed to support purchases costing less than a penny. It is based on decentralized validation of electronic cash at the vendor's server without any additional communication, expensive encryption, or off-line processing. Key aspects of Millicent are its use of brokers and of scrip. Brokers take care of account management, billing, connection maintenance, and establishing accounts with vendors. Scrip is digital cash that is only valid for a specific vendor. The vendor locally validates the scrip to prevent customer fraud, such as double spending.
Dwork, et al. have suggested that a way to discourage spam is to force senders of e-mail to pay for the privilege by performing a computation. Cynthia Dwork and Moni Naor, “Pricing via processing or combating junk mail,” Advances in Cryptology—CRYPTO '92, pp. 139–147. Independently, Adam Back later proposed Hash Cash for use in protecting mailing lists. See http://cypherspace.org/˜adam/hashcash/. (Hashcash is a proof of work “coin”, and is based on a mathematical puzzle that is hard to solve but is easy to verify it was solved correctly.) There have also been proposals to use Hash Cash to deter junk e-mail. See http://www.camram.org. The Camram approach is described in further detail below. Others have proposed using computational puzzles to discourage attacks on servers. See http://www.rsasecurity.com/rsalabs/staff/bios/ajuels/publications/client-puzzles/. Captcha is a project that creates puzzles that can be solved only by humans, for the purpose of telling humans and computers apart over a network. (Captcha stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart.) Typically the puzzles involve having the user read a sequence of characters from a visually cluttered image. Further information can be found at www.captcha.net.
Camram, cited above, is an anti-spam system utilizing an electronic peer-to-peer model of postage, using proof of work and digital signature techniques. The Camram system intentionally provides enough information to allow intermediate machines to filter spam. The basic idea can be summarized as follows: If an e-mail message does not have an electronic postage stamp, it will not be allowed into the recipient's mailbox. Whenever a sender transmits e-mail to an intended recipient, an electronic postage stamp will be generated. Unlike physical postage, the sender does not spend money but instead spends time by solving a puzzle, the solution to which becomes a postage stamp. The theory is that the economics of spam changes when e-mail is required to have postage. A single postage stamp is not hard to create. Spammers, however, count on being able to send thousands or hundreds of thousands, or more, of messages very quickly and if they need to calculate postage stamps for every message, it will slow them down and consume CPU resources. Making spam more expensive in this manner is intended to discourage spammers from operating. Another advantage to putting electronic postage on e-mail is that it can also be used as a key for filtering out spam. As discussed above, ordinary filtering techniques may not work well for a variety of reasons. Thus, by adding an easily detectable and verifiable postage stamp, users would be able to filter out e-mail that does not have this postage stamp.
FIG. 1A depicts the normal relationship between a mail server and an ordinary POP-3 user. The user reads or creates messages via the message user interface (UI). When sending a message, the message is handed to a delivery agent that communicates with the mail server at the ISP. Receiving messages is basically the reverse process. When the user tells the message UI to grab a message, the delivery agent fetches e-mail via POP-3 and stores it locally in local mailboxes. The message UI then reads the local mailboxes and displays the messages.
FIG. 1B depicts a scenario in which Camram is employed. Outbound, the message is passed through Camram before being handed to the delivery agent. The Camram module then creates all additional headers and any necessary alterations to the message before passing the message to the delivery agent for delivery. When receiving a message, Camram again sits between the delivery agent and the message UI. In this case, Camram will examine each message and put it in the local mailbox if it passes one of three challenges: 1) hashcash coin, 2) digital signature, 3) being from someone on the white list. If the message fails to pass one of those three hurdles, Camram will put the message in “Camram jail” and send a response back to the message originator. If that message bounces, then the message triggering the auto response message would be deleted along with the bounce message.
An alternative solution has also been proposed. This alternative is a proxy containing Camram functionality running on the user's machine, as shown in FIG. 1C. In this proxy model, the user's e-mail client would connect to the proxy on the user's machine. The proxy would intercept all e-mail coming and going. On e-mail outbound, it would perform the appropriate hashcash and signature stamping of the mail. On e-mail inbound, it would perform the appropriate filtering and redirection of e-mail.
It has also been suggested that, at the enterprise or service provider level, Camram will not impact operations significantly; and that, as Camram adoption increases, the service provider is given an opportunity to further increase the cost of spamming to spammers and open relays, by rejecting messages that do not contain Camram postage.
The above-cited co-pending application Ser. No. 10/291,260, filed on Nov. 8, 2002, entitled “Ticket Server For Spam Prevention And The Like,” describes a system for reducing undesirable network traffic, such as spam. In an exemplary application of the invention, a “ticket server” receives ticket requests and responds thereto by issuing a complete ticket or a Ticket Kit comprising a challenge. If a Ticket Kit is issued, the client is able to construct a valid ticket from a correct answer to the challenge and the data in the Ticket Kit. A challenge is described as possibly including a CPU bound task, a memory bound task, a task that can be solved only by a human, or monetary payment.
Assume that a sender S wishes to send an e-mail message M to a recipient R. If the recipient R can determine that message M is almost certainly not spam, either by recognizing the sender's address as a being that of a friend or by some analysis of M, then M may be sent and received in the normal way. If S is not confident that R will accept the message, S may be instructed to compute some moderately-hard function F(M) and send the message and computed function, i.e., (M, F(M)), to R. The recipient R can verify that the communication is of the form (M, F(M)). If the verification succeeds, R can receive M and optionally allow future e-mail from S without the F( ) computation. Otherwise, R can “bounce” (reject) M, indicating in a bounce message where S can obtain software that will compute F(M).
The function F( ) is preferably chosen so that the sender S takes at least several seconds to perform the computation, but that the verification by the recipient R is at least a few, thousand times faster. Thus, people who wish to send e-mail to a recipient for the first time are discouraged from doing so, because they are forced to spend several seconds of CPU time. For a spammer, sending many millions of messages, this can become prohibitive.
One challenge with the system described above is that fast CPUs run much faster than slow CPUs: Consider a 2.2 GHz PC versus a 33 MHz PDA. If a computation takes a few seconds on a PC, it might take minutes on a PDA, which is unfortunate for PDA users. The computation could instead be done by a more powerful machine on behalf of the PDA, but we would prefer to avoid this complication where possible.