This disclosure relates generally to the field of modular refrigeration units (MRUs) for use in conjunction with mainframe computers or servers, and more particularly to health monitoring of a MRU during operation.
The power dissipation of integrated circuit chips, and the modules containing the chips, continues to increase in order to achieve increases in computer processor performance. This trend poses a cooling challenge at both the module and system level. Increased air flow rates are needed to effectively cool high power modules and to limit the temperature of the air that is exhausted into a data center, as overheated computer equipment may cease to function properly. In many larger server applications, processors along with their associated electronics (e.g., memory, disk drives, power supplies, etc.) are packaged within a rack or frame. Heat that is produced by the computer components of the server rack or frame may stress the operation of the server. This is especially true for large installations such as server farms or large banks of computer racks close together. In such installations, MRUs may be used to cool individual servers in the server room. An MRU is a refrigeration unit that is built into a server to cool computer components that are internal to the server. An MRU may include one or two active refrigeration loops. An MRU is a critical component to server operations, which may be heavily disrupted in the event of MRU failure. Failure of an MRU may lead to stress in the server containing the MRU, and possibly other servers in the installation due to heat buildup in the installation.
As an MRU ages, the cooling capability of the MRU may be reduced. In order to compensate for reduced cooling capability in an MRU, the MRU may enter an overtemperature recovery (OTR) mode. An MRU maintains the temperature in the vicinity of the component that is being cooled by the MRU (referred to as THAT) at a particular desired temperature. However, if THAT is over the desired temperature for an extended period of time during operation, the MRU enters into OTR mode, in which the flow of coolant in the MRU is automatically increased by a set amount in order to lower THAT. However, MRUs may have a relatively high fail rate while operating in the OTR regime, and addressing reduced cooling capability of an MRU automatically with an OTR is not always successful. If that is the case, the MRU fails and needs to be replaced. Such an MRU failure may be disruptive to operation of the server containing the MRU.