1. Technical Field
This invention relates to a system and method for managing partitionable elements in a multiple element computer system. More specifically, the partitionable elements are managed in relation to terms of a service level agreement.
2. Description of the Prior Art
A multiprocessor computer system by definition contain multiple processors, also referred to herein as CPUs, that can execute multiple processes or multiple threads within a single process simultaneously in a manner known as parallel computing. In general, multiprocessor systems execute multiple processes or threads faster than conventional uniprocessor systems, such as personal computers (PCs), that execute programs sequentially. The actual performance advantage is a function of a number of factors, including the degree to which parts of a multithreaded process and/or multiple distinct processes can be executed in parallel and the architecture of the particular multiprocessor system at hand.
The architecture of shared memory multiprocessor systems may be classified by how their memory is physically organized. In distributed shared memory (DSM) machines, the memory is divided into modules physically placed near one or more processors, typically on a processor node. Although all of the memory modules are globally accessible, a processor can access local memory on its node faster than remote memory on other nodes. Because the memory access time differs based on memory location, such systems are also called non-uniform memory access (NUMA) machines. In centralized shared memory machines, on the other hand, the memory is physically in one location. Centralized shared memory computers are called uniform memory access (UMA) machines because the memory is equidistant in time for each of the processors. Both forms of memory organization typically use high-speed caches in conjunction with main memory to reduce execution time.
Processor nodes may be grouped to form a partition, which is a collection of one or more nodes interconnected together to form a computing environment for an operating system. Multiple partitions can exist within the same computer system. Each partition within a computer system executes a single independent operating system image. A multiprocessor computer system may be in the structure of a collection of nodes or partitions, including service processor hardware, a management console and other infrastructure, representing a single manageable and configurable environment. Accordingly, a system can be split into multiple logical computer systems or partitions, each of which executes a single operating system image.
In addition to multiprocessor computing systems in the form of partitioned nodes, there are also bladed multiprocessing computing systems. The bladed system is a collection of distributed computing resources available over a local or wide area network that appears as one large virtual computing system to an end user or application. Each computing resource is a server on a removable card that plugs into a shared infrastructure. The computing resources share a common chassis, power supply, service processor, fibre, storage, cooling, heat dissipation, keyboard, video, mouse and a connection to the local or wide area network. Each resource within the system may be configured to function under different operating systems. Accordingly, a bladed multiprocessing system is an example of a scalable system with multiple partitionable resources adapted to communicate through common communication connections.
A partitioned multiprocessor computing environment and a bladed multiprocessor computing environment are both comprised of multiple compute elements. Each element includes at minimum a printed circuit board with one or more microprocessors, memory, logic to connect and control the processors and memory, an I/O controller, and a communication port. Current management of multiple compute element systems, including bladed computer systems as well as partitioned computer systems, require shut-down of a specific compute element when maintenance is required. Each element operates in one of two states, on or off. There is no intermediate state of operation. Lack of power management control at the element level, creates a major problem when heat dissipation, cooling, or power consumption becomes an issue in a multiple element system.
One of the features present on today's laptop and personal computers is the ability of the computer to be placed in a low power state of operation, by reducing the processor frequency, turning off unneeded I/O, suspension, or hibernation. These low power states of operation are known in the art in relation to personal computers. In the Suspend state, the clock for the processor(s) is halted, which greatly reduces power to the processor and other accessories on the motherboard, but the memory remains intact. This is a state of low power consumption. When the operator of the computer wants to regain usage of the hardware accessory, the operator must Restore full power and restart the clock to the processor, the motherboard, and the associated hardware accessories, eliminating a full system restart since the memory remains intact. In addition to the Suspend state, the personal computer may also be placed in the low power state of Hibernate where power to the computer is turned off following replication of the memory on storage media. Therefore, it is less time consuming to enter the Suspend state and Restore power to the computer or to enter the Hibernate state and resume delivery of power to the computer, rather than to terminate power to the computer and restart the full system at a later time. Furthermore, reducing the clock speed on a processor is less time consuming than to Suspend and Resume a system which is less time consuming than Hibernate and Restore, due to the mechanisms involved. Accordingly, use of the Suspend or Hibernate states on the laptop or personal computer are two examples of power management on a personal computer system.
Finally, it has become common in the marketplace to employ the use of Service Level Agreements (SLAs) to define the relationship between a customer utilizing computer resources of a service provider. For example, the SLA commonly includes the criteria of the services to be provided, such as providing a certain number of web pages, supporting transactions, etc. Technology exists to reduce the power consumption on a computer system by turning off unused parts of the system, such as I/O, drives, slowing down the processor(s), quiescing or suspending processor(s), hibernating processor(s), and even powering off parts of the system when they are not required. Newer systems are being developed and delivered to customers that have multiple processors, multiple cores within a processor package, multiple central electronic complex (CEC), and/or multiple Blades. At full load of these newer systems, all of the elements therein may need to be available to operate within the terms of the SLA. However, during off-peak times, some of the hardware in the newer system may sit idle for an extended period of time while consuming electricity and generating heat. Accordingly, there is a need to apply power management to these newer systems while delivering service to the customer within the terms of the SLA.