Computing technology has advanced such that today's computers can undertake complex computations at relatively high speeds. There is, of course, no one single computer with sufficient computing resources to undertake all requested computing tasks at a single moment in time. Accordingly, computers have been configured to execute cooperatively and in parallel to perform large computational tasks. In an example, data centers include a plurality of computing devices that are configured to cooperatively execute to achieve a particular task. Thus, multiple computers are frequently purposed for a single activity.
Problems arise, however, when all the computers that are purposed for the single activity are not simultaneously operational. For instance, one or more computers may require rebooting, may be reserved for higher-priority activities, may require upgrades or maintenance, etc. There is a significant amount of difficulty in managing and controlling these computers, particularly when numerous computers (e.g., thousands) are purposed for a specific activity.
One method to manage availability is to commission staff to monitor states of computing devices in a data center. People, however, are error-prone; individuals frequently administer computers in different conflicting ways. In addition to the inconsistency in administration, costs can be high when commissioning people to perform such management tasks.
An alternative approach is to develop a software-based solution that monitors operational states of computers and manages their activities. Typically, these software-based solutions tend to be rigid with respect to stakeholder's goals of the data center and computers utilized in the data center. In other words, these solutions operate sub-optimally if goals change and/or if computers are added or removed from the data center. Updating the software-based solution for a dynamic system is costly and error-prone.