With the rapid advances in network technologies, distributed computing has become an increasingly popular computing approach as it allows sharing of computational resources (for example, memory, processing time, input/output, etc.) among many different users or systems or any combination thereof. One such example is “cloud computing”, which involves applying the resources of several computers in a network to a single problem at the same time. Cloud computing is Internet (“cloud”) based development and use of computer technology (“computing”). Conceptually, infrastructure details are abstracted from the users and/or systems that no longer need knowledge of, expertise in, or control over the technology infrastructure “in the cloud” that support them. It typically involves the provision of dynamically scalable and often virtualized resources as a service over the Internet.
A distributed system (also be called as cluster) contains a set of resources interconnected by a network. Resource Manager controls the assignment of available resources to distributed applications running on at least one cluster. FIG. 1 show shows a broad level diagram of the resource manager controlling the assignment of available resources to distributed applications running on the cluster. As shown in FIG. 1, the major components that are involved are resource manager (RM), application manager (AM), RM agent, and task executor running on the RM Agent source like computer. The resource manager keeps track of live RM-agents and available resources. It allocates available resources to applications and tasks based on resource requirements specified by the application manager. The AM coordinates execution of all tasks within the application life cycle, asks for containers to run tasks, sends “Resource Request” to “Resource Manager”. The resource request can specify required resources like memory, CPU, etc. User provides the resource requirement for the task-executors (tasks) of the distributed application. The RM agent sends periodic updates to “RM” about available resources on the host/RM Agent, start “task-executor” process on host, based on resources allocated by “resource manager” on the host, and monitors resource usage of “task-executor”. The task executor is responsible for execution of different type of application tasks, and each task can have different resource (e.g. RAM, CPU) demands.
FIG. 2 shows distributed execution flow for resource allocation as available in the prior-art. As shown in FIG. 2, the resource managers have mechanism to kill rogue tasks that consume resources more than allocated. The tasks request for high resources (CPU, Memory, etc) to avoid getting killed as actual resource usage is always less than requested amount of resources. The resource manager calculates available resources based on resources allocated for execution of a particular task and not based on actual resources used for execution. This type of calculation results in under-utilization of cluster resources. Further, an administrator is required to manually analyze and identify if resources are underutilized, and then do the required change in the configurations for next run and optimize the resource usage which is not practical for large clusters.
In spite of the available mechanisms, the prior-art techniques have certain critical issues as the applications that are not utilizing the allocated resources results in resource under-utilization. Further, the administrator has to manually monitor the resource utilization statistics from history data, and tune the client resource requests. This process of monitoring is tedious and complex process in a large cluster with wide variety of applications which increases cost of operations staff. Also, the resource under utilization results in deploying more nodes in the cluster with higher configuration hardware, which leads to higher cost for vendors as they need to invest more on hardware. So effective utilization of resources is not taken into account.