Commodity cluster computing is being used more widely especially in high performance and technical computing. Commodity cluster computing is the use of large numbers of readily available computing components for parallel computing to get the largest amount of useful computation at low cost. Commodity cluster computing uses multiple low cost, low performance commodity computers working in parallel instead of using fewer high-performance and high-cost computers. Commodity computers are computer systems manufactured by multiple vendors, incorporating components based on open standards.
Clustered computer systems comprise multiple compute nodes interconnected through with high speed network connections. The compute nodes can be heterogeneous with different type of processors, number of cores, memory size, type and speed. Some compute nodes can have accelerator technologies such as Field Programmable Gate Array (FPGA), General-Purpose computing on Graphics Processing Units (GPGPU) and co-processors. For economic and technical reasons most of these clustered computer systems access stored data through a shared parallel file system and therefore through network connections. Each compute node can read and write data at the speed of the network connections and the global performance of a parallel application depends on the number of network connections, which is implicitly related to the number of compute nodes.
Applications are submitted to the clustered computer system through scheduling software which monitors, orchestrates and manages the resources of the clustered computer system in an optimal manner. The resources of the clustered computer system are allocated based on policies, free resources and application requirements in a way in which the global utilization of the clustered computer system and/or user response times are optimized. The policies may include, for example, priorities, resource usage and resource allocation per user, per group or per application. The application requirements may include, for example, the number of cores, the number of compute nodes, the amount of memory, the total time or the location of the data.
Prior art solutions are not able to correlate, in real time, the real hardware resource consumption rates with application performance characteristics or needs. Examples of hardware resource consumption rates include network usage, disk I/O, memory and cache usage, register usage, usage of floating and instruction units, usage of PCI bus and the like The scheduling of applications does not take into account the resources needs and the behavior of the applications executing within the clustered computer system.
Another key challenge for parallel applications is to optimize the usage of the network connections. The best performance is generally obtained by maximizing the network performance and therefore the number network connections. The number of network connections is implicitly related to the number of compute nodes. In this configuration the scheduling software must find the balance between the number of processes per compute node and the number of compute nodes while optimizing the whole workload. This is possible only by analyzing the compatibility of the applications versus the available resources.
United States Patent Application 2014/0068627, which is hereby incorporated by reference, discloses a method for policy-based self-control and healing by adjusting workloads dynamically on a plurality of resources in the data center according to a set of policy rules. Thresholds or trends corresponding to target operational limits are used in some embodiments; they may be included in the set of policy rules. For example, a threshold may trigger the distribution or re-distribution of workloads in the data center dynamically when the threshold is approached or exceeded. It does not disclose the use of real analytics tool using, for instance, times series, for hardware components and utilization.
“Intel Performance Counter Monitor—A better way to measure CPU utilization” at https://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization discloses “CPU resource”-aware scheduling. A simple scheduler executes 1000 compute intensive and 1000 memory-bandwidth intensive jobs in a single thread. The challenge was the existence of non-predictable memory-band intensive background load on the system, a rather typical situation in modern multi component systems with many third party components. The scheduler detects that a lot of the memory bandwidth is currently used by memory-bandwidth intensive jobs and can schedule other compute intensive jobs to execute at the same time as the memory-band intensive background activity with the memory-bandwidth intensive jobs being scheduled between the memory-band intensive background activity. A single thread only in a single computer is scheduled and only memory and compute use are monitored.