1. Field of the Invention
This invention relates to computer architectures and, more specifically, to distributed, shared memory multiprocessor computer systems.
2. Background Information
Distributed shared memory computer systems, such as symmetric multiprocessor (SMP) systems support high-performance application processing. Conventional SMP systems include a plurality of processors coupled together by a bus. One characteristic of SMP systems is that memory space is typically shared among all of the processors. That is, each processor accesses programs in the shared memory, and processors communicate with each other via that memory (e.g., through messages and status information left in shared address spaces). In some SMP systems, the processors may also be able to exchange signals directly. One or more operating systems are typically stored in the shared memory. These operating systems control the distribution of processes or threads among the various processors. The operating system kernels may execute on any processor, and may even execute in parallel. By allowing many different processors to execute different processes or threads simultaneously, the execution speed of a given application may be greatly increased.
FIG. 1 is a block diagram of a conventional SMP system 100. System 100 includes a plurality of processors 102a-e, each connected to a system bus 104. A memory 106 and an input/output (I/O) bridge 108 are also connected to the system bus 104. The I/O bridge 108 is also coupled to one or more I/O busses 110a-c. The I/O bridge 108 basically provides a xe2x80x9cbridgingxe2x80x9d function between the system bus 104 and the I/O busses 110a-c. Various I/O devices 112, such as disk drives, data collection devices, keyboards, CD-ROM drives, etc., may be attached to the I/O busses 110a-c. Each processor 102a-e can access memory 106 and/or various input/output devices 112 via the system bus 104. Each processor 102a-e has at least one level of cache memory 114a-e that is private to the respective processor 102a-e. 
The cache memories 114a-e typically contain an image of data from memory 106 that is being utilized by the respective processor 102a-e. Since the cache memories of two processors (e.g., caches 114b and 114e) may contain overlapping or identical images of data from main memory 106, if one processor (e.g., processor 102b) were to alter the data in its cache (e.g., cache 114b), the data in the other cache (e.g., cache 114e) would become invalid or stale. To prevent the other processor (e.g., processor 102e) from acting on invalid or stale data, SMP systems, such as system 100, typically include some type of cache coherency protocol.
In general, cache coherency protocols cause other processors to be notified when an update (e.g., a write) is about to take place at some processor""s cache. Other processors, to the extent they also have copies of this same data in their caches, may then invalidate their copies of the data. The write is typically broadcast to the processors which then update the copies of the data in their local caches. Protocols or algorithms, some of which may be relatively complex, are often used to determine which entries in a cache should be overwritten when more data than can be stored in the cache is received.
I/O bridge 108 may also include one or more cache memories (not shown) of its own. The bridge cache is used to store data received via system bus 104 from memory 106 and/or the processor caches 114 that is intended for one or more of the I/O devices 112. That is, bridge 108 forwards the data from its cache onto one or more of the I/O busses 110. Data may also be received by an I/O device 112 and stored at the bridge cache before being driven onto system bus 104 for receipt by a processor 102 or memory 106. Generally, the data stored in the cache of I/O bridge 108 is not coherent with the system 110. In small computer systems, it is reasonable for an I/O bridge not to maintain cache coherence for read transactions because those transactions (fetching data from the cache coherent domain) are implicitly ordered and the data is consumed immediately by the device. However, in large computer systems with distributed memory, I/O devices, such as devices 112, are not guaranteed to receive coherent data.
U.S. Pat. No. 5,884,100 to Normoyle et al. discloses a single central processing unit (CPU) chip in which an I/O system is disposed on (i.e., built right onto) the core or package of the CPU chip. That is, Normoyle discloses an I/O system that is part of the CPU chipset. Because the I/O system in the Normoyle patent is located in such close proximity to the CPU, and there is only one CPU, the Normoyle patent is purportedly able to keep the I/O system coherent with the CPU.
In symmetrical multiprocessor computer systems, however, it would be difficult to incorporate the I/O system onto the processor chipset. For example, the Normoyle patent provides no suggestion as to how its I/O system might interface with other CPUs or with other I/O systems. Thus, a need exists for providing cache coherency in the I/O domain of a symmetrical multiprocessor system.
However, by imposing cache coherency on the I/O domain of a symmetrical multiprocessor computer system, other problems that could degrade system""s performance may result. For example, some cache coherency protocols, if applied to the I/O bridge, may result in two or more I/O devices, who are competing for the same data, becoming xe2x80x9clivelockedxe2x80x9d. In other words, neither I/O device is able to access the data. As a result, both devices are xe2x80x9cstarvedxe2x80x9d of data and are unable to make any progress in their respective processes or application programs. Accordingly, a need exists, not just for providing cache coherency in the I/O domain, but for also ensuring continued, high-level operation of the symmetrical multiprocessor system.
Briefly, the invention relates to a system and method for avoiding xe2x80x9clivelockxe2x80x9d and xe2x80x9cstarvationxe2x80x9d among two or more input/output (I/O) devices competing for the same data in a symmetrical multiprocessor (SMP) computer system. The SMP computer system includes a plurality of interconnected processors having corresponding caches, one or more memories that are shared by the processors, and a plurality of I/O bridges to which the I/O devices are coupled. Each I/O bridge includes one or more upstream buffers and one or more downstream buffers. An up engine is coupled to the upstream buffer and controls the flow of information, including requests for data, from the I/O devices to the processors and shared memory. A down engine is coupled to the downstream buffer, and controls the flow of information from the processors and shared memory to the I/O devices. A cache coherency protocol is executed in the I/O bridge in order to keep the data in the downstream buffer coherent with the processor caches and shared memory. As part of the cache coherency protocol, the I/O bridge obtains xe2x80x9cexclusivexe2x80x9d (not shared) ownership of all data fetched from the processor caches and the shared memory, and invalidates and releases any data in the downstream buffer that is requested by a processor or by some other I/O bridge.
To prevent two I/O devices from becoming xe2x80x9clivelockedxe2x80x9d in response to competing requests for the same data, each I/O bridge further includes at least one non-coherent memory device which is also coupled to and thus under the control of the down engine. Before invalidating data requested by a competing device or entity, the down engine at the I/O bridge receiving the request first copies that data to the bridge""s non-coherent memory device. The down engine then takes the largest amount of the copied data that it xe2x80x9cknowsxe2x80x9d to be coherent (despite the request for that data by a processor or other I/O bridge) and releases only that amount to the I/O device which originally requested the data from the bridge. In the illustrative embodiment, this xe2x80x9cknownxe2x80x9d coherent amount of data corresponds to one I/O bus cycle. The remaining data that was copied into the non-coherent memory device is then discarded. In this way, the I/O device that originally requested the data is guaranteed to make at least some forward progress despite data collisions, and yet data coherency is still maintained within the I/O domain of the SMP computer system.
In another embodiment of the invention, the I/O bridge includes a single, dual-property buffer configured to store both coherent and non-coherent data. Each entry of the dual-property buffer includes a tag that specifies whether the respective entry contains coherent or non-coherent data. As data is entered into a buffer entry in response to request for exclusive ownership of that data, the I/O bridge sets the respective tag to indicate that the data is coherent. If the data is subsequently requested by a competing device or entity, the I/O bridge changes the respective tag from coherent to non-coherent. For buffer entries whose tag indicates that the data is non-coherent, the I/O bridge preferably releases to the target I/O device only that amount xe2x80x9cknownxe2x80x9d to be coherent.