The interconnection of relatively inexpensive microcomputers via networks, such as the Internet, presents opportunities to provide computing power that can rival very costly supercomputers. Known as grid computing, the harnessing of such computing power typically involves a master computer that assigns portions of a computing task to a plurality of discrete client computers via a network.
One of the more well-known grid computing applications is the SETI@home project (http://setiathome.ssl.berkeley.edu) sponsored by the Search for Extraterrestrial Intelligence with support from The Planetary Society, 65 North Catalina Avenue, Pasadena, Calif. 91106-2301 USA (http://www.planetary.org). SETI@home is a computing effort that utilizes immense amounts of computing power. In a nutshell, each client in the grid analyzes a small portion of a huge volume of radio telescope data, to mine for extraterrestrial radio communications or other evidence of extraterrestrial life. The radio telescope data is, by-and-large, simply radio-frequency background noise generated by the universe, and therefore the task of discerning an extraterrestrial broadcast within that data is an enormous undertaking. The undertaking is perceived to have low odds of success and little obvious commercial value, thereby making the use of a supercomputer to perform this task cost prohibitive. The SETI@home project is thus perceived to be an ideal task for grid computing. To participate, individuals with personal computers connected to the Internet go to the SETI Web site and download a special screensaver. The screensaver volunteers the individual computer to be a client in a grid of thousands of client computers. SETI's system assigns portions of the data to be processed by each individual client computer.
SETI@home is, however, but one example of the potential for grid computing. In general, grid computing can offer computing power to individuals and institutions that would not otherwise have access to supercomputers.
One difficulty common to grid computing is the management of each client machine. Numerous problems can arise when trying to manage any particular computing task, problems that are exacerbated as more and more machines participate in the task. For example, in the SETI@home project, each client machine is typically owned and operated by individuals, who may at any given time choose to “drop out” of participating in the grid computing application. Even where those individuals themselves choose to remain, problems with any individual client, or network problems between the manager and client, will frustrate the performance of the larger computing task. The manager must thus keep track of the performance of each client and accommodate failures in order to properly complete the task.
It is expected that certain problems of grid computing can be overcome with the Open Grid Services Architecture (“OGSA”), which promises to provide a common standard that will make the implementation of software applications via grid computing relatively straightforward. Thus, manager and client machines that are OGSA compliant will at least be able to use the OGSA layer to handle, in a standardized fashion, at least some of the connectivity issues between the manager and the clients.
However, even with the OGSA, problems remain. Each client in a grid is inherently unreliable, either due to client or network failure, making performance of the task less reliable than simply running the task on a supercomputer. Problems are further exacerbated by the fact that there can be a delay before the master detects the failure of any given client. Still further problems arise upon detection of the failure of a particular client, as it may be necessary to restart the entire task if that failed client happened to be performing some critical portion of the task.