1. Technical Field
The present invention is directed to an apparatus, method, and computer program product for stopping processors without using non-maskable interrupts. More specifically, the present invention is direct to an apparatus, method and computer program product for stopping processors in a multiprocessor system that have interrupts disabled without using non-maskable interrupts.
2. Description of Related Art
In multiprocessor systems, there are times when a processor in the multiprocessor system experiences an event, or the debugger is entered, yet the software is unable to stop all of the other processors. An xe2x80x9ceventxe2x80x9d is any occurrence that causes one of the processors to enter a debugger, e.g. an error, an exception, an interrupt, or the like. This is usually due to the other processors looping in a portion of code that is in a disabled state, i.e. code that has disabled interrupts on the processor.
This situation has typically been handled by the use of interrupts. An interrupt is a signal informing a program that an event has occurred. When a program receives an interrupt signal, it takes a specified action (which can be to ignore the signal). Interrupt signals can cause a program to suspend itself temporarily to service the interrupt.
Interrupt signals can come from a variety of sources. For example, every keystroke on a keyboard generates an interrupt signal. Interrupts can also be generated by other devices, such as a printer, to indicate that some event has occurred. These are called hardware interrupts.
Interrupt signals initiated by programs are called software interrupts. A software interrupt is also called a trap or an exception. Each type of software interrupt is associated with an interrupt handler, i.e. a routine that takes control when the interrupt occurs. For example, when a key is pressed on a keyboard, this action triggers a specific interrupt handler. The complete list of interrupts and associated interrupt handlers is stored in a table called the interrupt vector table.
The particular interrupt used to address the problem described above with regard to multiprocessor systems is called a non-maskable interrupt. A non-maskable interrupt (NMI) is a high-priority interrupt that cannot be disabled by another interrupt. NMIs are used to report malfunctions such as parity, bus and math coprocessor errors. The NMI is non-maskable in that this interrupt can not be disabled by software and cannot be ignored by the system.
Not all multiprocessor systems support NMI and thus, not all multiprocessor systems are capable of using NMI to stop processors that are looping in disabled code when an event on one of the processors occurs. Moreover, even on multiprocessor systems that support an NMI mechanism, asserting the NMI from software in a recoverable way in order to stop other processors in the system is not a simple task. Thus, it would be beneficial to have an apparatus, method and computer program product for stopping processors in a multiprocessor system without using NMI.
The present invention provides an apparatus, method and computer program product for stopping processors in a multiprocessor system without using non-maskable interrupts. With the apparatus, method and computer program product of the present invention, at system initialization time, a copy of the operating system (OS) kernel is copied to a new physical location in memory. When a processor enters the debugger due to the occurrence of an event, such as encountering a breakpoint, a trigger, a watchpoint, the occurrence of an error, or the like, the debugger switches its virtual-to-physical address mapping to point to the new copy of the OS kernel. The original copy of the OS kernel is then modified by inserting breakpoints, e.g., interrupts, in a repeating pattern in the text of the original copy of the OS kernel, with the exception of the breakpoint handler text in the original copy of the OS kernel.
The debugger then performs architecture dependent actions to flush the executing processors"" data caches so that the modifications are present in memory and therefore, visible to the instruction stream of the other processors. The debugger then performs architecture dependent actions to broadcast instruction cache invalidate operations to thereby force the processors to refetch instructions from the OS kernel.
When the remaining processors fetch the OS kernel instructions, the instructions are fetched from the modified OS kernel. Thus, the processors encounter the inserted breakpoints and enter a breakpoint handler. The breakpoint handler then, by virtue of the switched virtual-to-physical address mapping, redirects the processor to the new copy of the OS kernel and handles the breakpoint in a normal fashion, e.g. causes the processor to enter the debugger. The debugger, at this point, now has control over all of the processors in the multiprocessor system. Thus, the debugger is now able to diagnose what the other processors in the multiprocessor system were doing at the time the event occurred. Once the event is handled, i.e. the debugger is exited, the original OS kernel may be recovered from the new copy, the virtual-to-physical mapping may be reset to its original state, and the system may be recovered.