1. Field of the Invention
The present invention relates to distributed computing in a network having multiple computing nodes. More particularly, the present invention relates to providing distributed computing services based on utilizing multiple computing nodes for sharing tasks associated with distributed computing services to provide a virtualization of distributed computing over a network of heterogeneous resources.
2. Description of the Related Art
Distributed computing has evolved to technologies that enable multiple devices to share in providing a given service. For example, grid computing has been proposed as a model for solving substantially large computational problems using large numbers of computers arranged as clusters embedded in a distributed infrastructure. Grid computing has been described as differentiated from distributed computing through an increased focus on resource sharing, coordination, manageability, and high performance. The focus on resource sharing has been referred to as “the grid problem”, which has been defined as the set of problems associated with resource sharing among a set of individuals or groups.
A fundamental problem of distributed systems is the assumption that each peer computing node is substantially the same size. For example, a RAID (Redundant Array of Inexpensive Disks) storage device is implemented based on implementing a plurality of identical sized discs: a write operation to a disk at a prescribed disk block location can be easily repeated on the remaining disks of the RAID storage device by performing the same write operation to the same prescribed disk block location. Hence, existing systems do not contemplate the problem that a given peer computing node may run out of resources while attempting to perform its own tasks. In particular, if a distributed system is implemented using computing nodes (which may share resources such as computing power, storage capacity, bandwidth, etc.), where each computing node is responsible not only for storing its own data, but also for backing up data for other nodes, then a problem arises if a substantially larger node having a substantially larger amount of data (i.e., at least an order of magnitude larger) joins the network because the larger node will overwhelm the capacity of the smaller nodes of the distributed system. Consequently, a smaller node would be forced to either not back up the data of the larger node, resulting in a loss of data if the larger node is unavailable, or no longer store the data that it is responsible for, resulting in a loss of that data. Hence, the smaller nodes are incapable of storing their own data and backing up the data of the larger node.
In addition, attempts to partition a substantially larger computing node into multiple virtual computing nodes having a size that matches the existing computing nodes does not solve the problem of losing data, since a random distribution of the virtual computing nodes among the existing computing nodes still may result in one virtual computing node backing up the data of a second virtual computing node, wasting resources by replicating data on the same physical device, namely the larger computing node; further, a loss of the substantially larger computing node will result in a loss of all the multiple virtual computing nodes, hence the distributed network can encounter a loss of data based on relying on the virtual nodes backing up each other.
Consequently, newer, high-performance machines cannot be added to the network, since the newer machines must have the same capacity as the existing machines to prevent overwhelming the existing machines. Alternately, the newer machines must be configured to limit their resource utilization to the capacity of the existing machines, preventing the additional capacity of the newer machines from being utilized.