1. Field of the Invention
This invention relates generally to computer system diagnostics and, more particularly, to diagnostic processing in massively parallel computing systems.
2. Description of the Related Art
Although it is less common in recent years, diagnostic programs or applications are often provided to purchasers of computer systems. The capabilities and ease of use of these diagnostic programs vary with the type and manufacturer of the computer. Diagnostic programs for desktop computers and workstations typically have a relatively easy to use interface, and are often limited in capability. On the other hand, diagnostic programs for mainframe computers and supercomputers may have much greater diagnostic capabilities, but often are also much more difficult to use. As a result, such programs are sometimes used only by service personnel hired by the manufacturer.
The development of diagnostic programs has become increasingly difficult as the use of multiprocessor computer systems has become more common, particularly scalable computer systems such as cache coherent non-uniform memory access (ccNUMA) systems. Operating systems of such systems are designed to insulate users from functions they perform, such as task distribution and load balancing, while diagnostic systems must do the exact opposite, i.e., inform the user of what portion of the computer system is being used. Diagnostic systems are required to access physical memory and privileged kernel space, and certain machine instructions have to be in kernel mode to execute on a processor. As a result, the ordinary operating system cannot be used to run full diagnostic software on scalable computing systems that can be used to implement a ccNUMA computing system. If the diagnostic software is designed to be used by people who are not employees of the computer manufacturer, powerful diagnostic capabilities must be accompanied by a user interface that is easy to use.
It is an object of the present invention to provide user control of diagnostic processing results in a diagnostic environment for a highly scalable computer system.
It is another object of the present invention to provide efficient processing of diagnostic programs on a ccNUMA computing system.
It is an additional object of the present invention to provide a flexible diagnostic environment for a ccNUMA computing system that supports both simple and sophisticated user interfaces and diagnostic tasks initiated and executing in different ways.
To achieve these objects, the present invention provides a method of executing diagnostic programs on a multi-node computer system, including executing a shell process at an interface node to provide a user interface; storing diagnostic data, generated by execution of diagnostic programs under control of a diagnostic microkernal at each of the nodes, in memory locations accessible by the diagnostic microkernal and the shell process; and accessing the diagnostic data by the shell process to output data in response to instructions received via the user interface. When each node of the multi-node computer system includes a plurality of processors, a thread of the diagnostic microkernal is initialized in each of the processors in each node.
Preferably, the shell process and diagnostic microkernal support both a message passing protocol and memory sharing for communication between the nodes. The diagnostic microkernal may be initialized at each of the nodes with instructions required for managing diagnostic processing and resources locally at the node and communication between the nodes may be accomplished using shared memory. Alternatively, the diagnostic microkernal may be initialized at each of the nodes with all instructions and data required for managing diagnostic processing and resources locally at the node, with communication between the nodes accomplished using message passing.
Diagnostic programs executing in a diagnostic operating environment according to the present invention preferably produce formatted data for output in response to instructions from the user interface and the shell process echoes the formatted data via the user interface under control of the user. This enables diagnostic threads to execute in a plurality of the nodes, where the threads perform identical diagnostic operations at each of the nodes without overwhelming the user interface with the results of the diagnostic processing.
These together with other objects and advantages which will be subsequently apparent, reside in the details of construction and operation as more fully hereinafter described and claimed, reference being had to the accompanying drawings forming a part hereof, wherein like numerals refer to like parts throughout.