Not applicable.
1. Field of the Invention
The present invention generally relates to a computer system with multiple processors. More particularly, the invention relates to a distributed shared memory multiprocessing computer system that supports a high performance, scalable and efficient input/output (xe2x80x9cI/Oxe2x80x9d) port protocol to connect to I/O devices.
2. Background of the Invention
Distributed computer systems typically comprise multiple computers connected to each other by a communications network. In some distributed computer systems, networked computers can access shared data. Such systems are sometimes known as parallel computers. If a large number of computers are networked, the distributed system is considered to be xe2x80x9cmassivelyxe2x80x9d parallel. One advantage of a massively parallel computer is that it can solve complex computational problems in a reasonable amount of time.
In such systems, the memories of the computers are collectively known as a Distributed Shared Memory (xe2x80x9cDSMxe2x80x9d). It is a problem to ensure that the data stored in the DSM is accessed in a coherent manner. Coherency, in part, means that only one processor can modify any part of the data at any one time, otherwise the state of the system would be nondeterministic.
Recently, DSM systems have been built as a cluster of Symmetric Multiprocessors (xe2x80x9cSMPxe2x80x9d). In SMP systems, shared memory can be implemented efficiently in hardware since the processors are symmetric (e.g., identical in construction and in operation) and operate on a single, shared processor bus. Symmetric multiprocessor systems have good price/performance ratios with four or eight processors. However, because of the specially designed bus that makes message passing between the processors a bottleneck, it is difficult to scale the size of an SMP system beyond twelve or sixteen processors.
It is desired to construct large-scale DSM systems using processors connected by a network. The goal is to allow processors to efficiently share the memories so that data fetched by one program executed on a first processor from memory attached to a second processor is immediately available to all processors.
DSM systems function by using message passing to maintain the coherency of the shared memory distributed throughout the multiprocessing computer system. A message is composed of packets that contain identification information and data. Control of message routing is distributed throughout the system and each processor visited by a message traveling through the multiprocessing computer system controls the routing of the message through it. Message passing can reduce system performance since delays in transmission of message packets can slow down program execution. Delays in transmission can occur because of high latency due to congestion in the network (i.e., many messages trying to go through the limited physical connections of the networks). This type of congestion can cause tremendous performance degradation that can result in high overall program execution times.
Each processor of a distributed shared memory computer system typically connects to an I/O bridge/Bus Interface ASIC (referred to as xe2x80x9cI/O bridge ASICxe2x80x9d) that permits the processor to gain access to input or output devices. Such devices may be keyboards, monitors, disk drives, hard drives, CD-ROM, tape backup systems, and a host of other peripheral I/O devices. The processor typically implements an I/O port protocol that interfaces the processor to the external I/O device through the I/O bridge ASIC. The I/O port protocol performs many operations between the processor and external I/O devices across the I/O bridge ASIC. These operations include direct memory access (xe2x80x9cDMAxe2x80x9d) read streams, DMA write streams, processor access to I/O devices, I/O device interrupt handling, coherence for I/O translation lookaside buffers (xe2x80x9cTLBxe2x80x9d), and peer-to-peer I/O communication between two different I/O devices.
Although prior art I/O port protocols used between processors and their I/O bridge ASICs have been suitable for single processor computer systems or twelve to sixteen node single bus SMP systems, these I/O port protocols lacked the ability to allow efficient and fast I/O port operations for a scalable DSM multiprocessing computer system. DSM computer systems which used the computer systems internal bus protocol could not take advantage of the memory and cache coherence protocols because of implementation differences between the internal bus protocol and coherence protocol. Thus, an I/O access required translation between the two protocols resulting in complex translation hardware, increased implementation cost and reduced computer system performance. Therefore, it is desired to implement an I/O port protocol compatible with a DSM computer system memory and cache coherence protocol that permits I/O port operations to take place in the DSM computer system efficiently, quickly and easily while maintaining the coherency of the data accessed by I/O port devices.
The problems noted above are solved in large part by a distributed multiprocessing computer system that includes a plurality of processors each coupled to an I/O bridge ASIC implementing an I/O port protocol. One or more I/O devices are coupled to the I/O bridge ASIC, each I/O device capable of accessing machine resources in the computer system by transmitting and receiving message packets. Machine resources in the computer system include data blocks, registers and interrupt queues. Each processor in the computer system is coupled to a memory module capable of storing data blocks shared between the processors. Coherence of the shared data blocks in this shared memory system is maintained using a directory based coherence protocol. Coherence of data blocks transferred during I/O device access to machine resources is maintained using the same coherence protocol as for the memory system. Data blocks transferred during an I/O device read or write access may be buffered by the I/O bridge ASIC only if the I/O bridge ASIC has exclusive copies of the data blocks.
The I/O bridge ASIC includes a DMA device that supports both in-order and out-of-order DMA read and write streams of data blocks. An in-order stream of reads of data blocks performed by the DMA device using coherence memory barriers between each read ensures a certain level of memory consistency such that the DMA device receives coherent data blocks that do not have to be written back to the memory module.
In the distributed multiprocessing computer system, I/O devices can generate interrupts by writing to an interrupt queue in a destination processor. The write of the interrupt queue in the destination processor is implemented by sending message packets containing an interrupt through the bridge logic device and intermediate processors to the interrupt queue in the destination processor.