1. Field of the Invention
This invention relates to the field of multiprocessor computer systems and, more particularly, to communication error reporting mechanisms in multiprocessor computer systems.
2. Description of the Relevant Art
Multiprocessing computer systems include two or more processors which may be employed to perform computing tasks. A particular computing task may be performed upon one processor while other processors perform unrelated computing tasks. Alternatively, components of a particular computing task may be distributed among multiple processors to decrease the time required to perform the computing task as a whole. Generally speaking, a processor is a device configured to perform an operation upon one or more operands to produce a result. The operation is performed in response to an instruction executed by the processor.
A popular architecture in commercial multiprocessing computer systems is the symmetric multiprocessor (SMP) architecture. Typically, an SMP computer system comprises multiple processors connected through a cache hierarchy to a shared bus. Additionally connected to the bus is a memory, which is shared among the processors in the system. Access to any particular memory location within the memory occurs in a similar amount of time as access to any other particular memory location. Since each location in the memory may be accessed in a uniform manner, this structure is often referred to as a uniform memory architecture (UMA).
Processors are often configured with internal caches, and one or more caches are typically included in the cache hierarchy between the processors and the shared bus in an SMP computer system. Multiple copies of data residing at a particular main memory address may be stored in these caches. In order to maintain the shared memory model, in which a particular address stores exactly one data value at any given time, shared bus computer systems employ cache coherency. Generally speaking, an operation is coherent if the effects of the operation upon data stored at a particular memory address are reflected in each copy of the data within the cache hierarchy. For example, when data stored at a particular memory address is updated, the update may be supplied to the caches which are storing copies of the previous data. Alternatively, the copies of the previous data may be invalidated in the caches such that a subsequent access to the particular memory address causes the updated copy to be transferred from main memory. For shared bus systems, a snoop bus protocol is typically employed. Each coherent transaction performed upon the shared bus is examined (or xe2x80x9csnoopedxe2x80x9d) against data in the caches. If a copy of the affected data is found, the state of the cache line containing the data may be updated in response to the coherent transaction.
Unfortunately, shared bus architectures suffer from several drawbacks which limit their usefulness in multiprocessing computer systems. A bus is capable of a peak bandwidth (e.g. a number of bytes/second which may be transferred across the bus). As additional processors are attached to the bus, the bandwidth required to supply the processors with data and instructions may exceed the peak bus bandwidth. Since some processors are forced to wait for available bus bandwidth, performance of the computer system suffers when the bandwidth requirements of the processors exceeds available bus bandwidth.
Additionally, adding more processors to a shared bus increases the capacitive loading on the bus and may even cause the physical length of the bus to be increased. The increased capacitive loading and extended bus length increases the delay in propagating a signal across the bus. Due to the increased propagation delay, transactions may take longer to perform. Therefore, the peak bandwidth of the bus may decrease as more processors are added.
These problems are further magnified by the continued increase in operating frequency and performance of processors. The increased performance enabled by the higher frequencies and more advanced processor microarchitectures results in higher bandwidth requirements than previous processor generations, even for the same number of processors. Therefore, buses which previously provided sufficient bandwidth for a multiprocessing computer system may be insufficient for a similar computer system employing the higher performance processors.
Another approach for implementing multiprocessing computer systems is a scalable shared memory (SSM) architecture (also referred to as a distributed shared memory architecture). An SSM architecture includes multiple nodes within which processors and memory reside. The multiple nodes communicate via a network coupled therebetween. When considered as a whole, the memory included within the multiple nodes forms the shared memory for the computer system. Typically, directories are used to identify which nodes have cached copies of data corresponding to a particular address. Coherency activities may be generated via examination of the directories.
SSM systems are scaleable, overcoming the limitations of the shared bus architecture. Since many of the processor accesses are completed within a node, nodes typically have much lower bandwidth requirements upon the network than a shared bus architecture must provide upon its shared bus. The nodes may operate at high clock frequency and bandwidth, accessing the network when needed. Additional nodes may be added to the network without affecting the local bandwidth of the nodes. Instead, only the network bandwidth is affected.
In a typical SSM system, a global domain is created by way of the SSM protocol which makes all the memory attached to the global domain look like one shared memory accessible to all of its processors. A global domain typically runs a single kernel. Hardware provides conventional MMU (memory management unit) protection, and the kernel manages mappings (e.g. reloading of key registers on context switches) to allow user programs to co-exist without trusting one another. Since the nodes of a global domain share memory and may cache data, a software error in one node may create a fatal software error which may crash the entire system. Similarly, a fatal hardware error in one node will typically cause the entire global domain to crash.
Accordingly, in another approach to multiprocessing computer systems, clustering may be employed to provide greater fault protection. Unlike SSM approaches, the memory of one node in a cluster system is not freely accessible by processors of other cluster nodes. Likewise, the I/O of one node is typically not freely accessible by processors of other nodes. While memory is not freely shared between nodes of a cluster, a cluster allows nodes to communicate with each other in a protected way using an interconnection network which may be initialized by the operating system. Normally, each node of a cluster runs a separate kernel. Nodes connected in a cluster should not be able to spread local faults, both hardware and software, that would crash other nodes.
Cluster systems are often built on communication mechanisms which are less reliable than, for instance, SMP buses, since they must connect computers in separate chassis which may be separated by substantial distances. Because of this, cluster operations may incur errors, and application programs must be informed of these errors so that they can take appropriate recovery steps.
An ideal error reporting mechanism would be completely accurate and easy to use. Currently-used technology has various limitations in this area. For instance, interfaces which do not provide process-virtualized error information, but log errors on a controller- or system-wide basis, may cause processes which were not responsible for an error to incur error recovery overhead. On the other hand, interfaces which report error information directly to an initiating processor in the form of a processor fault or trap are less easy to use, since many programming languages do not cleanly support the handling of asynchronous errors.
It is accordingly desirable that a cluster communication interconnect be able to tolerate communication errors, and that it be able to report those errors to the software responsible for them. For maximum efficiency, it is desirable that the interconnect be able to provide error information directly to an application process, rather than to the operating system.
In one approach to communication error reporting in a cluster system, a number of cluster error status registers are embedded in each communications interface. Each of these registers is associated with a particular processor in the multiprocessor computer system. When a cluster operation initiated by one of the processors incurs an error, the interface notes that error in the cluster error status register associated with that processor. Applications may read their cluster error status register whenever they wish to check the status of previously performed cluster operations. The per-processor cluster error status registers are saved and restored on processor context switches, thus providing virtual-per application cluster error status registers to every operating system process.
Systems employing such approaches to communication error reporting suffer from various drawbacks. For example, in a system which contains multiple cluster interfaces, an application which wants to ascertain the status of its operations may need to read multiple cluster error status registers, one from each cluster interface. This increases the time needed to perform a complete messaging operation. In addition, the operating system must save and restore multiple cluster error status registers for each process during a context switch. This increases context switch time and thus adds to the general overhead imposed by the operating system.
Another drawback to such systems is that the cluster interface must contain cluster error status registers for all processors which could possibly be part of any machine in which it is installed. This adds to the cost of the interface, which is a particular drawback when trying to develop a high-volume, low cost implementation which is usable in multiple types of systems.
It is thus desirable to provide a fast and reliable error communication mechanism in a multiprocessing computer system which allows for efficient and scalable implementations of user and kernel-level communication protocols.
The problems outlined above may in large part be solved by a communication error reporting mechanism in accordance with the present invention. In one embodiment, a multiprocessing computer system includes a plurality of processing nodes, each including one or more processors, a memory, and a system interface. The plurality of processing nodes may be interconnected through a global interconnect network which supports cluster communications. The system interface of an initiating node may launch a request to a remote node""s memory or I/O. The computer system implements an error communication reporting mechanism wherein errors associated with remote transactions may be reported back to a particular processor which initiated the transaction. Each processor includes an error status register that is large enough to hold a transaction error code. The protocol associated with a local bus of each node (i.e., a bus interconnecting the processors of a node to the node""s system interface) includes acknowledgement messages for transactions when they have completed. In the event a transaction which is transmitted by a system interface upon the global interconnect network on behalf of a particular processor incurs an error, the system interface sets an error flag in the acknowledgement message and provides an associated error code. If the acknowledgement message denotes an error, the error code is written into the processor""s error status register for later retrieval by software. In various embodiments, a system interface may acknowledge a transaction to a given processor early (even if that transaction has not completed globally) if a subsequent transaction from the same processor is pending in the interface.
Advantageously, the per-processor error status registers may be saved and restored on processor context switches, thus providing virtual per-application cluster error status registers to every operating system process. Improved scaling may be attained in embodiments employing multiple system interfaces since only a single error status register needs to be read on an error check or context switch. Additionally, a processor may perform a read to its associated error status register without executing a cycle upon the local bus. Still further, errors may be reported without processor faults or traps.