A computer system is generally designed to ensure that its constituent components each operate within a specified temperature range within a predefined environment. However, in actual practice the environment in which the computers operate may not be able to sustain the temperature specification at all times. For instance, in a datacenter, the cooling air temperature and flow rates to each computer may vary spatially and temporally. Furthermore, the fluctuation in the environmental conditions will likely affect the operation of the computers.
Some conventional computer equipment already has power and thermal management capability. For example, the Central Processing Unit (CPU) typically has built-in temperature sensors. These built-in sensors monitor the operating temperature of the CPU and activate and control a throttling mechanism when the temperature of the CPU reaches a predefined threshold level. Once activated, the throttling mechanism lowers the computational activities of the CPU and, hence, its temperature. As a result, the CPU is placed in a computational state that reduces its computational performance.
A power and thermal management scheme such as the one mentioned above helps to maintain the temperature of the computer components to be within a given specified temperature range, however, this temperature stability comes at a cost. That is, by throttling the computational activities of the equipment, uncertainties in the overall computational performance of the overall computer system are introduced. It is desirable, therefore, to provide a method and corresponding tools for estimating the overall computational performance of a set of computer equipment with multiple computational units with respect 1) to the operating environment in which the computer equipment operates 2) the proximity of each computer equipment relative to each others
Furthermore, because cooling air temperature and flow rates may vary within an operating environment, it is also desirable to provide a method and corresponding tools to guide the placement of computational units within the environment. For example, the computational units can be located such that the hottest units are located where the cooling air is the coolest or otherwise where the cooling air is used most efficiently.
U.S. Pat. No. 6,959,265 discloses user-centric measurement of quality of service (QoS) in a computer network; with each workstation or information access point (IAP) having installed thereon a QoS module for monitoring the IAP's performance. Because the QoS indices are time normalized, they may be aggregated or compared across the network by an IT administrator. The disclosure of the '265 patent provides a diagnostic tool for monitoring and analyzing the performance of applications on various workstations; with an aggregated number of threads and aggregated number of handles launched by an application in question. Further, the system disclosed in the '265 patent is able to display various snapshots of the system performance with a view of events, response time, resource usage, etc.
U.S. Published Application No. 20050027727-A1 discloses distributed data gathering and aggregation agent; with aggregation of operational metrics, e.g., performance metrics, process events, health monitor state, server state, for a plurality of members as a single entry and for a plurality of entities as a singular entity. As disclosed, a computer may operate in a networked environment using logical connections to one or more other, remote, computers, such as a remote client computer. At least one of the remote computers may be a workstation, a server computer, a router, a peer device or other common network node.
Systems and methods from related art do not include a system or method as set forth and claimed below. For example, none of the related art include a software tool for estimating the aggregated performance of a set of computers in a given cooling environment while providing the best location layout of each individual computer such that the highest overall performance can be achieved without changing the cooling environment. Further, related art systems do not include a software tool wherein the tool can also be used to estimate the aggregated performance for a given acoustic noise requirement or a tool that can also swap computational tasks between individual computers to ensure they are not constrained by the component temperatures, and be able to distribute the workload in such as way that the an overall optimized balance of computing unit temperature and performance is achieved. Further, none of the related art systems or methods includes a software tool that is able to display a dashboard which shows the temperature and performance of each individual computing unit as well as the overall system as a whole and, further, provides alerts proactively on components with low level of performance and critically high operating temperature.