As an alternative to purchasing computer systems, a user can lease portions of a massive computer system, much as a traveler might lease a hotel room or an event planner might lease a hotel-dining hall. This business model, introduced by Hewlett-Packard Company as a “Utility Data Center” (“UDC”), allows flexible access to computer resources without the burden of maintaining a computer system.
Of course, the owner of the computer system must maintain it. Not only must the owner of the computer system provide maintenance, but do so in a way that ensures that contractual obligations are met. Since failures are inevitable in a large system, provisions must be made to move a user's workload to working hardware that meets user specification.
Computer system maintenance can be automated to a large extent. An automated workload manager can test for or otherwise detect failures. For example, the workload manager can send out requests and check for responses. If a device does not respond, it may have failed or it may be inaccessible due to failure of another device (e.g., a server may be inaccessible because a switch port to which it is connected has failed.) In either case, a failure can be noted, e.g., in a device database.
If possible, workload on a failed device can be migrated to an available device. In any event, a failed device will not be targeted for installation of a new workload or the target of a software migration. In due course, hardware or replacement of devices marked “failed” can obviate the failure.
Herein, related art is described to facilitate understanding of the invention. Related art labeled “prior art” is admitted prior art; related art not labeled “prior art” is not admitted prior art.