1. Field of the Invention
The present invention relates to logical-partitioned (LPAR) servers, and more particularly to systems and methods for effecting serialization in logical-partitioned systems in an effective and efficient manner.
2. Description of the Related Art
Multiprocessor computer systems are well known in the art, and provide for increased processing capability by allowing processing paths to be divided among several different system processors. More recently, symmetric multiprocessor (SMP) systems have been partitioned to behave as multiple independent computer systems. For example, a single system having eight processors might be configured to treat each of the eight processors (or multiple groups of one or more processors) as a separate system for processing purposes. Each of these “virtual” systems would have its own copy of an operating system, and may then be independently assigned tasks, or may operate together as a processing cluster, which provides for both high speed processing and improved reliability.
Most major computer companies developed partitioned systems as it became clear that there was benefit to consolidating multiple systems into a single system. For example, IBM started partitioning its S/370 mainframe systems in the 1970's. Since then, logical partitioning on IBM mainframes has evolved from a predominantly physical partitioning scheme, based on hardware boundaries, to one that allows for virtual and shared resources with dynamic load balancing. In 1999, IBM implemented LPAR support on the AS/400 platform, and in 2000, IBM announced the ability to run the LINUX operating system in an LPAR on its zSeries server.
In 2001, IBM introduced its pSeries 690 server, which also utilized logical partitioning. The architectural design of the pSeries 690 brought logical partitioning to the UNIX world, being capable of creating up to 16 partitions inside a single server, with greater flexibility and resource selection.
Partly as a result of these advancements, servers now exist to provide the performance, scalability, and reliability required in “mission critical environments.” These servers run corporate applications, such as enterprise resource planning (ERP), business intelligence (BI), and high performance e-business infrastructures. Proper operation of these systems can be critical to the operation of an organization and it is therefore of the highest importance that they operate efficiently and as error-free as possible, and rapid problem analysis and recovery from system errors is vital.
In normal operation, a partitioned system operates in parallel, that is, the operations being performed by the partitions can occur simultaneously as the partitions share the operational resources of the server. With everything functioning properly, the various partitions, which may be operating using different operating systems (e.g., partition 1 might be using AIX by IBM while partition 5 might be using LINUX by Redhat), perform their functions simultaneously.
There are certain critical functions, however, that require serialization of the system for a short period of time. Serialization is the forcing of operations to occur in a serial, rather than parallel, fashion, even when the operations could be performed in parallel. Serialization is typically mandatory when the correctness of the computation depends upon or might depend upon the exact order of computation, or when an operation requires uninterrupted use of otherwise shared hardware resources (e.g., registers) for a brief time period.
One example of such a condition involves handling machine-check interrupts as a result of hardware errors. A “machine check” is an interrupt process that is initiated by a processor during operation. That is, a processor, via its normal use of executing instructions, may cause a machine check interrupt (by executing errant instructions) or experience a machine check interrupt (by executing ordinary instructions to a piece of hardware that is in an errant state). For example, a machine-check interrupt will be generated by a processor when the processor experiences an internal cache parity error; when it reads a memory location containing an uncorrectable error; when it reads an I/O device experiencing an error condition. The machine-check interrupt is non-maskable and needs immediate attention of the processor. The processor takes action by interrupting the current instruction stream (thread), saving the address and the machine-state of the interrupt thread, and executing the machine-check interrupt handler inside a “hypervisor.” A hypervisor is system firmware that, among other things, controls the coordination between the processors and the hardware analysis system such as the machine-check interrupt handlers.
The hypervisor provides a machine check analysis process used by the machine check interrupt handler to identify the encountered error. The machine check analysis process involves “walking through the hardware” checking the function of registers, buffers, and the like, many of which are shared by all partitions during normal operations. The data resulting from this analysis is sent to various logging registers. For the machine check handler to be able to analyze the problem, the error status registers of the shared hardware must not be disturbed while the machine check analysis is in progress, and the logging registers must only be used by the processor running the machine check analysis. To assure this exclusive use of these registers during the machine check, the system is serialized to prevent a second (or third, fourth, etc.) processor, that also has taken a machine check interrupt, from trying to invoke the machine check analysis while it is in use by the first processor. This is typically accomplished using a known global “software lock,” as described in more detail below.
While the first processor is in the machine check analysis, if a second processor takes a machine check interrupt, it has to wait for the first one to finish the machine check analysis and unlock the global software lock. Completion of the machine check includes reporting the results of the analysis in an error log to the OS of the partition initiating the machine check interrupt, and waiting for the OS to acknowledge the capture of the error log. If this partition OS does not send the acknowledgement, the lock will remain locked indefinitely. Thus, as more and more partitions' processors are put into the wait state waiting for the global software lock to be unlocked so that they can run their respective machine checks, they are unable to function. This can eventually result in the entire system coming to a halt, which is an unacceptable outcome for a mission critical system or other systems on which large numbers of users depend.
FIGS. 1-3 illustrate a simplified example of the locking process involved in a prior art system. FIG. 1 is a block diagram illustrating the normal operation of a prior art partitioned system. Referring to FIG. 1, a server 100 is partitioned into sixteen partitions 101-116. It is understood that sixteen partitions are illustrated for the purposes of example only, and that any number of partitions may be used. Operating Systems OS1through OS16 are used by partitions 101-116, respectively. OS1-OS16 may all be the same operating system, or various combinations of different operating systems. A hardware analysis system 130 of the hypervisor 132 is utilized for performing a check of the system (e.g., machine check analysis) when an error occurs. A single pathway or “corridor” 125 is made available so that at any given time, one processor from one of the partitions can access the hardware analysis system 130. For illustrative purposes, corridor 125 is illustrated conceptually as a pivoting pathway in the shape of an arrow. This is done to illustrate the concept only and is not intended to illustrate the actual routing between server 100 and the hardware analysis system 130. The actual configuration is well known to one of ordinary skill in the art and is not discussed further herein.
A global lock 120 (e.g., a software lock) is provided to effect the serialization required during a machine check, as described in more detail below. In FIG. 1, global lock 120 is shown illustratively in an unlocked position, indicating that the system 100 is operating properly and in an unserialized state.
FIG. 2 is a block diagram illustrating the system of FIG. 1 when partition 101 has encountered a fault condition. Referring to FIG. 2, if operating system OS1 of partition 101 experiences a fault condition, OS1 “takes” a machine check and appropriates corridor 125 so that it can have access to hardware analysis portion 130. This is illustrated by showing corridor 125 pivoted to point to OS1 of partition 101.
So that no other partitions can use the system resources required for the machine check while it is occurring (i.e., to serialize the system), global lock 120 is locked as shown in FIG. 2. While in this locked position, none of the other partitions have access to corridor 125 and they cannot perform machine check analysis. If another partition, OS, e.g., OS5 of partition 105, experiences a fault and also wishes to perform a machine check analysis, it must wait until OS1 is completed with its machine check analysis. While in this waiting state, the waiting partition cannot perform any functions; it is paused, waiting for its turn to run the machine check analysis. Global lock 120 remains locked until it receives a command from OS1 (in this example) indicating that the machine check is completed, and the lock can then be unlocked for use by others.
The above-described system operates sufficiently as long as OS1 is able to issue the command to unlock the global lock 120. However, certain circumstances may occur which prevent OS1 from doing so. For example, if OS1 experiences an error condition while trying to send the acknowledgement to the hypervisor that causes it to circulate in a loop, it will circulate through the loop indefinitely and thus the command to unlock global lock 120 will never be issued. As additional operating systems experience machine checks, they are placed in waiting states, unable to perform their “mission critical” tasks; if this continues, eventually the entire system will “hang” and be inoperable.
FIG. 3 is a flowchart illustrating the operation of the system of FIGS. 1 and 2 during a series of sequentially occurring machine checks. At step 302, a first machine check occurs. At step 304 a determination is made as to whether or not the global lock is available, i.e., is in the unlocked state. Since, in this example, this is the first machine check occurrence, the determination will be in the affirmative and the process proceeds to step 306, where the global lock is taken to lock all other operating systems/partitions out of the machine check analysis process and thereby serialize the system while the first machine check analysis is in process. At step 308, the machine check analysis is performed. At step 310, the registers are restored to their status at the time of the interrupt.
At step 312, the machine check interrupt handler of the system passes control back to the operating system. This is essentially a signal to the operating system that the hardware analysis portion has performed its analysis, fixed a recoverable error or isolated the faulty hardware device, and the system is ready to go back to its parallel operating state. At step 314, captures the error log into non-volatile hard disk storage. At step 316, the operating system sends an acknowledgement to the hypervisor indicating the error capture of the log, and then the hypervisor issues the command to unlock the global lock.
If a second machine check occurs (step 303) before the operating system that initiated the first machine check has unlocked the global lock, then when the second machine check proceeds to the query of step 304 (“Is global lock available?”), the response will be in the negative, and the process will revert back into a loop to continue processing the query of step 304 until the global lock is available. During this process, the partition and operating system that initiated the second machine check is in a paused state and is not operating. As mentioned above, if the partition/operating system that initiated the first machine check is unable to, or simply fails to unlock the global lock, the second partition/operating system that initiated the second machine check will remain paused indefinitely.
Accordingly, a system and method is needed that will allow other partitions in a partitioned system to have access to machine check analysis when one or more of the other partitions experiences a problem.