1. Technical Field
This invention generally relates to computer systems, and more specifically relates to mechanisms and methods for public distributed computing.
2. Background Art
Public distributed computing is a relatively new phenomenon that allows a large number of computer systems to work on small tasks that make up small portions of a much larger task. Due to the vast numbers of computers that are connected to the Internet that have spare computing capacity, there is the opportunity for putting that spare computing capacity to work on public distributed computing tasks.
In a typical public distributed computing environment, a server computer system has work it would like to distribute to other client computer systems. The server computer system may recruit client computer systems in any suitable way, including direct e-mail, web page advertisements, or other advertisements that direct a user to go to a given web site or to e-mail a given e-mail address if the user is interested in participating in the project by allowing the user's machine to work on these tasks during idle times. When a user decides to participate, the user downloads and installs client code and permits it to execute. The user's client computer system requests work (a work unit) from the server. The server then sends a work unit to the client. The client code executes against the data in the work unit, which performs the desired work. When the processing of the work unit is done, the results are returned to the server. Note that the client code is typically reused by the client in processing many different work units.
One problem with public distributed computing is the ease of causing malicious results due to the lack of security. The security needed for a public distributed computing environment is much different than the security needed for financial transactions, network authentication, etc. In a financial transaction, for example, it is paramount that the user's identity be verified before performing any transactions on behalf of the user. Note, however, that in the public distributed environment, the server does not care about the actual identity of any particular user. The server only cares that the work is performed by the actual code provided by the server to any of the particular client computer systems and that the result, coming from normal (untampered-with) operation of such client code is returned to the server.
In current public distributed computing environments, the user identity (user id) is never validated. When the user “signs up” to join a particular public distributed computing project, any number of schemes may be employed. For example, the user may provide a “handle”—a fanciful name; an e-mail address; or may be assigned an arbitrary number. These may be employed or any number of similar means, individually or separately. The user id will be assumed in this document to be an individual. But, projects may use the notion of identity to represent a machine or the pairing of a user and a machine. None of this is critical. As will be seen, validation is not critical. What is critical is that there is an identity which forms a basis for certain exchanges of information between the server and what can be treated as a single individual with one or more machines, whatever the actuality.
Authentication that is currently performed relies upon the user identity and work unit name. When a client requests work, a work unit is sent to the client with a work unit name. When the results are received from the client, the name of the work unit in the results is checked to assure the name matches the work unit sent. If so, the results are accepted. If not, the results are rejected.
Hackers have exploited this lack of security for their own devious ends. For example, it is often considered to be a status symbol for a user to have a high score that indicates his or her computer system has performed a lot of work on a particular project. These hackers have discovered various means for fooling the server into thinking the returned results are valid when they are not. For example, the same result could be submitted by several different identities that are all attributed to the user. In the alternative, the user could alter the client code to make the code exit prematurely, thereby performing the computation very quickly and returning a result that is not correct. If the name of the work unit in the result is correct, the server may accept it without knowing that the work was not actually performed and without knowing that the result is incorrect.
The prior art recognized this problem, and has devised some ways to detect incorrect results. One such way is to send out work units with known answers. The answers returned could then be checked against the known answers to determine of the user performed the work. Another known way is to send out the same work to several different users. For example, if the same task is sent to five different users, and if four of the five return the same result, it is a good assumption that the fifth result is in error. Note that an error may be caused by a hardware or other system failure in addition to intentionally incorrect results returned by users who have malicious intent. Both of these prior art ways of checking answers require computing the entire answer multiple times, thereby reducing the available computing power to work on the overall problem.
The check performed by sending the same work to several different users may fail against well-organized, cooperative hackers. If, by chance or by some clever scheme, four hackers determine they all have the same work unit, and they can contrive to return some arbitrary answer, their bogus answer will win the “vote” and be accepted. If the hackers rely on chance and most computation is honest, this is not a major problem. But, to the extent possible, it would be well to go beyond statistical assurances, especially if hackers managed to repeatedly obtain the same work unit through any exploitation of weakness (e.g., handing out work units in a predictable order). The goal of the hackers may or may not be to ruin the calculation—but regardless, it could be uncomfortable to be on the receiving end of an attack in which the presumption that hackers cannot conspire to share particular work units turns out to be false.
Meanwhile, the server's computational resources are badly outgunned by the available horsepower of the collective clients. Accordingly, added cycles to do validation of any kind could affect the critical path of the project, namely the checking in and checking out of work units. Therefore, the server does not have the luxury of doing extensive checks of the work going out or coming back in. Besides slowing down the underlying solution, participation rates may drop if the users perceive the check-in and check-out process is slow, unreliable, or both. Added checks risk just such an outcome, so any must be carefully chosen.
Without a way to more efficiently check the results returned from clients in a public distributed computing environment, the current methods will result in either excessive overhead in checking the results by computing each result multiple times, will result in excessive overhead by accepting additional copies of results, or will result in not detecting incorrect results.