1. Field of the Invention
The present invention relates to distributed computing for large data sets on clusters of computers and more particularly to the optimization of fault tolerance in Map/Reduce computing.
2. Description of the Related Art
Distributed computing clusters have become common in the field of high-availability and high-performance computing. Specifically, distributed computing clusters have become common because cluster-based systems exhibit three important and fundamental characteristics or properties: reliability, availability and serviceability. Each feature will be understood to be of paramount importance when designing a robust clustered system. Generally, a clustered system consists of multiple application server instances grouped together in a server farm of one or more server computing nodes connected over high-speed network communicative linkages. Further, each application server instance in the application cluster can enjoy access to memory, possibly disk space and the facilities of a host operating system.
Among the many challenges faced by those who manage the capacity and performance of a clustered system is the allocation of network resources for consumption by a particular application or workload. Network resources in a cluster can be managed through agents known as workload managers. The workload managers can optimally assign different network resources within endpoint containers to handle selected workloads in an application. In many cases, workload managers can adjust the assignment of network resources based upon performance metrics measured through systems management components in the clustered system.
Clustered systems provide a natural infrastructure for use in modern Map/Reduce computing—a widely understood parallel programming technique for solving computational problems—those descriptions of computations to be performed by one or more computing resources to produce zero or more results. Of note, Map/Reduce computing can occur in “cloud” computing environments utilizing clustered systems. More particularly, Map/Reduce is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a “cloud” or “cluster”. Computational processing can occur on data stored either in a file system (unstructured) or within a database (structured). For the uninitiated, cloud computing refers to an Internet-based computing paradigm in which shared resources, software and information are provided to computers and other devices on-demand, much like electricity is provided to consumers over an electricity grid. Access to the resources of the “cloud” are governed by points of entry to the “cloud” that manage the relationship between the resource consumer according to the terms of a service level agreement (“SLA”) at a cost tracked on behalf of the consumer.
As it is well known, Map/Reduce has two main components a “Map” step and a “Reduce” step. In the “Map” step, the master node accepts input, chops the input into smaller sub-problems, and distributes those smaller sub-problems to correspondingly different worker nodes. (A worker node may do this again in turn, leading to a multi-level tree structure). The worker node in turn processes that smaller problem, and passes the answer back to its master node. In many cases, each worker node processes multiple sub-problems. Thereafter, in the “Reduce” step, the master node then takes the answers to all the sub-problems and combines them in a way to get the output—the answer to the problem it was originally trying to solve.
One advantage of Map/Reduce is that Map/Reduce allows for distributed processing of the map and reduction operations. Provided each mapping operation is independent of the other, all maps can be performed in parallel—though in practice it is limited either or both of the data source and the number of central processing units (CPUs) near that data. Similarly, a set of ‘reducers’ can perform the reduction phase—all that is required is that all outputs of the map operation that share the same key are presented to the same reducer, at the same time. While this process can often appear inefficient compared to algorithms that are more sequential, Map/Reduce can be applied to significantly larger datasets than that which “commodity” servers can handle—a large server farm can use Map/Reduce to sort a petabyte of data in only a few hours. The parallelism also offers some possibility of recovering from partial failure of servers or storage during the operation: if one mapper or reducer fails, the work can be rescheduled—assuming the input data are still available.
Even still, Map/Reduce computing over a distributed set of nodes in the “cloud” remains vulnerable to individual node failure where each node performs critical processing. In this regard, for some computational problems such as monte carlo simulations, the failure of a few nodes can be inconsequential. However, for more precise computational problems such as counting, nodal failure can produce an unacceptable result. Therefore, at present the state of each node can be determined by the repetitive pinging of each node in a cluster. Failed nodes can be replaced by new nodes performing the same tasks as those assigned to the failed nodes. Of course, the frequent pinging of the nodes in the network by way of a small polling interval can result in an unacceptable degree of ping traffic placed upon the network.