The present invention generally relates to a method and device for flexible temporary capacity upgrade of a computer system. Particularly, the present invention relates to a method and device for non-disruptive reduction of capacity provided by a computer system and to the enabling of the temporary upgrade for a predefined total of time within a predefined period of time.
The capacity, also called performance, of a computer system in terms of throughput and response times depends on its hardware and the software running on the hardware. The hardware capacity mainly depends on the performance of the one or more processors being present in the computer system, the number of processors and the efficiency of their cooperation, the size and access rates of the main storage, and the I/O bandwidth and its latencies. Such structure of a computer system comprising the processors, the respective cache infrastructure, and the I/O connection infrastructure may be referred to as the Central Electronic Complex (CEC).
In business software licensing is often based on the overall capacity of such a Central Electronic Complex (CEC). By adding or removing processors the CEC capacity may be changed. Software can recognize the CEC performance by different methods. It may directly retrieve the capacity indicators from the hardware via special instructions or it may just count it out in some short spin loops. According to the specific license agreement software may be limited to execute up to a predefined CEC capacity level, or royalties may be charged in accordance with the recognized actual CEC capacity.
Under such circumstances the customer seeks to minimize the amount of software license royalties by enabling CEC capacity appropriate to the customer's actual needs. This is especially important for capacity backup computer centers with much dormant capacity installed, just waiting to be enabled in case of a disaster, e.g., the total breakdown of a primary computer center. Such systems run at medium, low, or even very low CEC capacity, called the enabled capacity. The disabled (dormant) capacity may by far exceed the enabled capacity. For big installations the CEC capacity driven software license fee for a computer center may be multiple millions of dollars a year, exceeding the cost of the dormant hardware.
In order to support such customer needs, special hardware offerings are provided that allow for concurrent upgrade of CEC capacity, i.e., an upgrade of CEC capacity without interrupting the normal operation of the computer system. In order to facilitate a concurrent upgrade special interfaces have to be provided to make the hardware inform the software about increasing or decreasing CEC capacity. For disaster recovery it is mainly important that the upgrade (addition of processors) can be handled concurrently by hardware and by software. However, to ensure proper disaster recovery, periodical disaster recovery tests must be performed to verify hardware, software, and the overall disaster recovery procedures. This introduces the importance of concurrent reduction of CEC capacity. Any reduction of CEC capacity, especially by removing processors (disabling of physical processors) from the active configuration, usually requires dedicated involvement of the operating system. Removal of processors without involvement of the operating system may impact transactions, cause failure of software subsystems, or even result in a failure of the complete software system (operating systems with middle ware and applications).
As it is known to a person skilled on the art, computer systems comprise hardware and software. The hardware is usually called the ‘Computer’ including all the hardware entities while the software comprises operating system, middle ware, e.g., a data base systems, and application programs. The boundary between hardware and software is called the architecture level. The functionality furnished by the underlying hardware may therefore be called architecture. Hence, the architecture must be provided by the hardware and the binary software must comply to this architecture to be able to run on the respective hardware.
A particular piece of hardware, herein called ‘capacity virtualizer’, is able to ‘virtualize’ hardware resources. It may, e.g., be able to split the performance of a single physical processor into multiple logical processors. An example for such a Capacity virtualizer is the IBM zSeries 900 Logical PARtition facility (LPAR). For background information refer to U.S. Pat. No. 5,564,040 by Kubala, assigned to International Business Machines Corporation, Armonk, N.Y., US, filed 8 Nov. 1994, issued 8 Oct. 1996, “Method and Apparatus for Providing a Server Function in a Logically Partitioned Hardware Machine”, which describes aspects of logical partitioning. (The LPAR hypervisor is typically referred to as the “LPAR manager”. More detailed reference material can be found in IBM zSeries 900 Processor Resource/Systems Manager™ (PR/SM™) Planning Guide, SB10-7033-00, March 2001.)
A Capacity virtualizer is configured to provide multiple sets of virtual resources which allow to host multiple independent operating systems concurrently. A set of virtual resources can be seen as a virtualized part of the total computer system hardware providing all types of resources required to run an operating system, e.g., processor, memory and I/O. Consequently, it is herein called a ‘Virtual Computer’. Logical Processors are provided to the Virtual Computers (actually to the operating system running on a Virtual Computer) and the capacity of the processors is controlled by the virtualizer per Virtual Computer. By this the virtualizer is able to provide many more logical processors in total than physical processors are enabled.
Certain hardware, e.g., S/390 or zSeries enterprise servers by International Business Machines Corporation, Armonk, N.Y., US, uses Licensed Internal Code (LIC) executing dispatch algorithms for providing ‘logical’ processors from a pool of ‘physical’ processors, other hardware may use different approaches. LIC, sometimes also called ‘firmware’, is considered hardware not software because it runs below the architecture level and is part of the hardware shipment, like firmware in automobiles, washing-machines, or television sets.
Secure control over the enabled and the disabled (dormant) capacity can be done using mechanisms based upon cryptography e.g. like described in the patent U.S. Pat. No. 5,982,899 by Jürgen Probst, assigned to International Business Machines Corporation, Armonk, N.Y., US, filed 11 Aug. 1995, issued 9 Nov. 1999, “Method for Verifying the Configuration of the Computer System.”
Utilization of the above capabilities for concurrently, non-disruptively increasing the processor capacity of a computer is state of the art (e.g. IBM zSeries). In most cases this is sufficient since computers are usually upgraded, rather than downgraded, e.g. the number of enabled physical processors is increased. However, as mentioned above backup computer centers must periodically test the disaster recovery procedures. This does not only require concurrent (non-disruptive) upgrade of processor capacity but also requires concurrent reduction of processor capacity after the test is completed.