In distributed computer or information technology (IT) systems, the distribution of network and process resources, for example the location of resources, storage replication, load balancing and functionality, is transparent to users of these systems. Creation of this transparency is provided by the utilization of system resources in conjunction with network resources, e.g. bandwidth. These resources need to be effectively managed to provide the level of service that users of the system require. Effective management of these resources includes the allocation of sufficient amounts of resources to handle service demand patterns that are typically “bursty” and unpredictable. Conversely, these resources should not be over-provisioned so as to increase utilization and consequently cost efficiency of the IT system.
Resource management in large-scale, distributed IT systems faces a number of challenges. Current service models, for example grid and utility computing, increase the complexity of resource management by creating highly “bursty” and unpredictable resource demand patterns. In fact, these demand patterns are very difficult to anticipate or to characterize in advance. A critical component of the successful operation of these service models, however, is the ability to meet service level agreements (SLA's) for application performance. Therefore, resource management approaches were developed in an attempt to meet SLA's in an unpredictable resource demand environment. Conventional resource management approaches allocate system resources according to the statistical expectations of application resource demands, producing a theoretical bound on the probability of an SLA breach. By contrast, a dynamic resource manager monitors the performance of the IT system and the utilization of system resources, adjusting the allocation of resources when system operation is deemed to be off-target, or not meeting the prescribed SLA's.
In general, a resource manager acts upon the available controls that are used to apply scheduling methods to regulate and order the use of resources by the various applications. For example, a process scheduling function is used to proportion the processing resource, i.e. the central processing unit (CPU), among the various processes being executed by that processing resource. The resource manager proposes changes to this proportioning upon a determination that certain performance objectives are not being fulfilled.
Dynamic resource managers control various types of system resources, for example processing resources, e.g. CPU cycles or processing power, main memory, disk space and network resources, e.g. communication bandwidth and network buffers. Conventional approaches managed processing resources and networking resources independently, generally ignoring the complex dependency between availability and utilization of each one of these types of resources. In fact, prevailing approaches to performance-based resource management in distributed IT systems control processing resources and assume that network capacity is over-provisioned. This assumption only holds true in systems where the deployed applications are very computation intensive and have limited communication requirements. In those types of systems, network capacity could be ignored since it would never represent a potential resource bottleneck. As such, network resources and protocols are largely independent of the resource managers used in on-demand, distributed computing systems.
However, as distributed IT systems expand in size and geographic scale, make increased use of public and wide area networks and cope with unpredictable demands created by new application models such as grid computing, utility computing and multimedia stream processing, network resource management has become an important part of system management. Computing systems in use today combine public and wide-area networks with on-demand allocation of resources, e.g. multimedia streams. This combination results in the transfer of large amounts of data over substantial distances, and the communication or network resources used to transfer these data are not over-provisioned but result in a bottleneck. Therefore, network resources contribute a degree of unpredictability to the effective management of network resources, and the management of this unpredictability has become critical to overall system management.
Workload managers working in conjunction with load balances need to provide a level of control of system and resources that is fine enough to ensure the SLA guarantees, even in the presence of randomness, bursty usage patterns and public network limitations. Various algorithms have been proposed to address the desired level of control over system and network resources. One method uses sophisticated optimization techniques to quickly achieve the necessary bandwidth operating level that allows a system to reach a target processing level. The method relies on very low-overhead computations and at the same time provides a highly stable and robust approach to reaching the desired operating level. These methods, however, fail to treat multiple resources simultaneously and to take into account complex dependencies among the resources.
Therefore, systems and methods are needed that control network resource allocation to achieve application performance objectives and to achieve the desired use of the computer resources. These systems and methods would explicitly take into account the complex dependency between availability and utilization of each one of the types of resources in the system, both processing resources and network resources.