1. Field of the Invention
This invention relates to peer-to-peer networking, and more particularly to submitting and performing computational tasks in a distributed heterogeneous networked environment.
2. Description of the Related Art
The Internet has three valuable fundamental assets—information, bandwidth, and computing resources—all of which are vastly underutilized, partly due to the traditional client-server computing model. No single search engine or portal can locate and catalog the ever-increasing amount of information on the Web in a timely way. Moreover, a huge amount of information is transient and not subject to capture by techniques such as Web crawling. For example, research has estimated that the world produces two exabytes or about 2×1018 bytes of information every year, but only publishes about 300 terabytes or about 3×1012 bytes. In other words, for every megabyte of information produced, only one byte is published. Moreover, Google claims that it searches about only 1.3×10^8 web pages. Thus, finding useful information in real time is increasingly difficult.
Although miles of new fiber have been installed, the new bandwidth gets little use if everyone goes to one site for content and to another site for auctions. Instead, hot spots just get hotter while cold pipes remain cold. This is partly why most people still feel the congestion over the Internet while a single fiber's bandwidth has increased by a factor of 10^6 since 1975, doubling every 16 months.
New processors and storage devices continue to break records in speed and capacity, supporting more powerful end devices throughout the network. However, computation continues to accumulate around data centers, which have to increase their workloads at a crippling pace, thus putting immense pressure on space and power consumption.
Finally, computer users in general are accustomed to computer systems that are deterministic and synchronous in nature, and think of such a structure as the norm. For example, when a browser issues a URL (Uniform Resource Locator) request for a Web page, the output is typically expected to appear shortly afterwards. It is also typically expected that everyone around the world will be able to retrieve the same page from the same Web server using the same URL.
The term peer-to-peer networking or computing (often referred to as P2P) may be applied to a wide range of technologies that greatly increase the utilization of information, bandwidth, and computing resources in the Internet. Frequently, these P2P technologies adopt a network-based computing style that neither excludes nor inherently depends on centralized control points. Apart from improving the performance of information discovery, content delivery, and information processing, such a style also can enhance the overall reliability and fault-tolerance of computing systems.
FIGS. 1A and 1B are examples illustrating the peer-to-peer model. FIG. 1A shows two peer devices 104A and 104B that are currently connected. Either of the two peer devices 104 may serve as a client of or a server to the other device. FIG. 11B shows several peer devices 104 connected over the network 106 in a peer group. In the peer group, any of the peer devices 104 may serve as a client of or a server to any of the other devices.
Parallel computation has been an essential component of scientific computing for many years. Traditionally, the most popular type of parallel computation has been fine-grained parallelization, which requires substantial inter-node communication utilizing protocols such as Messaging Passing Interface (MPI) or Parallel Virtual Machine (PVM). Recently, however, there has been a growing demand for efficient mechanisms for carrying out computations which exhibit coarse-grained parallelism. The most common application of such mechanisms is distributed computing for large-scale computations. In these, numerous similar, but independent, tasks are performed to solve a large problem, or ensemble averages, where a simulation is run under a variety of initial conditions which are then combined to form the result, are utilized.
Distributed computing has traditionally been implemented using a small network of computers. While this solution works satisfactorily for many applications, it fails to take advantage of the large capacity in existing desktop computing power and network connectivity. More recently, distributed computing frameworks have been designed to help take advantage of the plethora of processors available over the Internet, many of which are not used a great deal of the time (e.g., personal computers). Existing grid computing mechanisms such as SunGRID and LSF may enable users to run an application over several computers in a network. Typically, there are restrictions on the set of computers that may participate in the computation. In prior art mechanisms for submitting tasks to a network of computers, typically the tasks cannot be run in different operating environments. The protocols used for transmission of data in prior art mechanisms may be restrictive; for example, prior art mechanisms typically do not allow tasks to be run on computers across firewalls, and may severely restrict the types of connection between computers participating in computations. Typically, the computers participating in the computations need to share a common storage area using mount points, etc. Some prior art mechanisms may use NFS mount points to share data, which requires the NFS protocol. While NFS may well within the boundaries of a small network or firewall, the NFS protocol typically does not allow crossing firewalls for security reasons, nor is it supported by most operating environments. Single entry points are typically used in prior art mechanisms to submit tasks; tasks cannot be submitted by any peer in the network. To submit tasks to a cluster of computers, users typically use some standard protocols such as MPI or PVM that enables the users to submit their tasks to a single entry point, which distributes the tasks over nodes in a homogeneous networked environment.
In the SETI@Home project, data from astronomical measurements is farmed out over the Internet to many processors for processing, and when completed returned to a centralized server and post-processed, in an attempt to aid in the detection of alien species. However, the SETI@Home framework has several disadvantages. First, it is only applicable to a single application. While conceivably the SETI@Home project could be modified or re-created to handle an application other than the search for extraterrestrial life, the framework cannot handle more than one application at a single time. Second, it utilizes a centralized server to distribute and post-process tasks over the network. This can create reliability and efficiency issues if the centralized server is not working properly or is bogged down, or if the network connections to the centralized server are lost.