The invention is generally related to computers and computer software. In particular, the invention is generally related to initiating a reset of a computer processor via a software-based mechanism.
Computer technology continues to advance at a rapid pace, with significant developments being made in both software and in the underlying hardware upon which such software executes. One significant advance in computer technology is the development of multi-processor computers, where multiple computer processors are interfaced with one another to permit multiple operations to be performed concurrently, thus improving the overall performance of such computers. Also, a number of multi-processor computer designs rely on logical partitioning to allocate computer resources to further enhance the performance of multiple concurrent tasks.
With logical partitioning, a single physical computer is permitted to operate essentially like multiple and independent xe2x80x9cvirtualxe2x80x9d computers (referred to as logical partitions), with the various resources in the physical computer (e.g., processors, memory, input/output devices) allocated among the various logical partitions. Each logical partition executes a separate operating system, and from the perspective of users and of the software executing on the logical partition, operates as a fully independent computer.
A shared resource, often referred to as a xe2x80x9chypervisorxe2x80x9d or partition manager, manages the logical partitions and facilitates the allocation of resources to different logical partitions. As a component of this function, a partition manager maintains separate virtual memory address spaces for the various logical partitions so that the memory utilized by each logical partition is fully independent of the other logical partitions. One or more address translation tables are typically used by a partition manager to map addresses from each virtual address space to different addresses in the physical, or real, address space of the computer. Then, whenever a logical partition attempts to access a particular virtual address, the partition manager translates the virtual address to a real address so that the shared memory can be accessed directly by the logical partition.
A primary benefit of multi-processor computers, and in particular of those implementing partitioned environments, is the ability to maintain at least partial operational capability in response to partial system failures. For example, while most computers, and in particular most multi-processor computers, are relatively reliable, the processors in such computers can xe2x80x9changxe2x80x9d from time to time and cease to operate in responsive and predictable manners, e.g., due to software design flaws, or xe2x80x9cbugsxe2x80x9d, that cause such processors to operate continuously in endless loops. In a partitioned environment in particular, hanging a processor allocated to a particular logical partition often results in that partition becoming at least partially inoperative and non-responsive. However, other logical partitions that do not rely on the hung processor are typically not affected by the failure.
While it may be acceptable in some situations to permit a computer to simply be powered off and on to recover from a hung processor, in many situations it is more desirable to provide the ability for a hung processor to be reset, or restored to a known state, in such a manner that the entire computer does not need to be shut down. Also, in a multi-processor computer, and in particular one that implements a partitioned environment, it is often desirable for such a reset operation to not affect other processors and/or other logical partitions operating in the computer so that the other processors and/or logical partitions can still perform useful operations while the hung processor is reset.
In many multi-processor computers, and in particular in those implementing partitioned environments, a software-based reset mechanism is typically supported to permit one processor to initiate a reset of another processor. Typically, a software-based reset mechanism relies on the use of interrupts, often referred to as inter-processor interrupts (IPI""s), to cause a hung processor to reset and restore itself to a known state. An IPI, like all interrupts, causes a processor to cease all current operations and immediately jump to dedicated program code, referred to as an xe2x80x9cinterrupt handlerxe2x80x9d, to handle the interrupt.
An IPI is typically handled as an xe2x80x9cexternalxe2x80x9d interrupt insofar as an IPI is initiated externally from the processor that receives the interrupt. Most processors, however, support the ability to selectively enable or disable external interrupts so that such interrupts will be ignoredxe2x80x94typically when a processor is executing relatively critical program code that should not be terminated prior to completion. The ability to disable external interrupts, however, introduces the possibility that a processor may hang while external interrupts are disabled, and thus be incapable of being reset through an IPI. Should this occur, the only manner of resetting the processor would likely be a hardware reset, which would typically necessitate a full restart of the computer, and a consequent temporary inaccessibility of the computer.
Therefore, a significant need exists for an alternate software-based reset mechanism for a processor that permits the processor to be reset in wider range of situations, and in particular, for a software-based reset mechanism for a processor that cannot be defeated as a result of the disabling of interrupts on the processor.
The invention addresses these and other problems associated with the prior art by providing an apparatus, program product, and method that utilize a memory access interrupt to effect a reset of a processor in a multi-processor environment. Specifically, one processor (referred to herein as a source processor) is permitted to initiate a reset of another processor (referred to herein as a target processor) simply by generating both a reset request and a memory access interrupt for the target processor. The target processor is then specifically configured to detect the presence of a pending reset request during handing of the memory access interrupt, such that the target processor will perform a reset operation responsive to detection of such a request.
Detection of a reset request is typically implemented within an interrupt handler that is executed by a target processor in response to a memory access interrupt. As a result, for those situations in which a memory access interrupt is generated for a reason other than to initiate a reset of the target processor, the target processor can handle the interrupt in an appropriate manner, and often with little additional overhead associated with determining whether a reset operation should be performed as a result of the interrupt.
A memory access interrupt may be considered to include any type of interrupt that is generated responsive to a memory access attempt by the target processor. Particularly given the general necessity for a processor to always be capable of accessing memory, a memory access interrupt is often further characterized as being incapable of being disabled during the operation of the target processor. As a consequence, unlike external interrupts such as IPI""s and the like which are capable of being disabled in some instances, a reset operation can be initiated on a target processor via a memory access interrupt irrespective of whether other interrupts are disabled on the processor.
While other alternative memory access interrupt implementations may also be utilized consistent with the invention, one particularly useful implementation relies on a type of memory access interrupt that is generated in response to an attempt by a target processor to access a virtual memory address in a virtual memory address space that is not mapped by any entry in an address translation table. Generation of a memory access interrupt then typically requires only that one or more entries in the address translation table be invalidated to ensure that a subsequent access to the virtual memory address space will attempt to access an unmapped virtual memory address.
Therefore, consistent with one aspect of the invention, a processor may be reset by generating a reset request for the processor, generating a memory access interrupt on the processor, and resetting the processor during handling of the memory access interrupt by the processor responsive to detection of the reset request.
These and other advantages and features, which characterize the invention, are set forth in the claims annexed hereto and forming a further part hereof. However, for a better understanding of the invention, and of the advantages and objectives attained through its use, reference should be made to the Drawings, and to the accompanying descriptive matter, in which there is described exemplary embodiments of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a computer consistent with the invention.
FIG. 2 is a block diagram of the primary software components and resources in the computer of FIG. 1.
FIG. 3 is a block diagram of an address translation table in FIG. 2.
FIG. 4 is a flowchart illustrating the program flow of a reset processor routine executed by a source processor in the computer of FIGS. 1 and 2.
FIG. 5 is a flowchart illustrating the program flow of a partition manager interrupt handler executed by a target processor in the computer of FIGS. 1 and 2, in response to a memory access interrupt.