A computer system 10 is depicted in FIG. 1. The computer system 10 includes a number of processors 11-1, 11-2, . . . , 11-n, a cache memory 13-1, 13-2, . . . , 13-n associated with each processor, a shared memory 14, and a number of I/O bridges 18-1, 18-2, . . . , 18-n interconnected by a systems bus 16. The operation and function of these devices are discussed below.
The shared memory 14 has an array of storage locations for storing data. Each storage location is of a fixed size, e.g., eight bits, and is assigned a unique identifier called an address. These addresses specify the data storage locations which are the subject of data access (i.e., read and write) operations. The shared memory 14 receives data access commands, i.e., data read and data write commands, where each command contains an address specifying the particular storage locations to be accessed by the respective command.
Illustratively, the storage locations are further organized (for reasons described below) into contiguous, non-overlapping, fixed length sequences called data lines. For example, the storage locations may be organized into thirty-two byte long data lines. Each data line has a unique line address similar to the above-described storage location addresses for accessing the entire data line.
The system bus 16 is for transferring data, addresses and commands between the different devices connected thereto, i.e., the processors 11-1, 11-2, . . . , 11-n, the cache memories 13-1, 13-2, . . . , 13-n, the shared memory 14 and the I/O bridges 18-1, 18-2, . . . , 18-n. As shown in FIG. 1, the system bus 16 comprises a data bus 16-1, for transferring data, a command bus 16-2, for transferring commands and addresses, and an arbitration bus 16-3, for use in allocating the system bus 16 to the other devices of the computer system 10. Illustratively, only one device may communicate via the system bus 16 at one time. The computer system 10 has an elaborate arbitration protocol for providing each device an opportunity to transfer data or commands on the system bus 16. The transfer of data and commands on the system bus 16 is discussed in greater detail below.
Each processor 11-1, 11-2, . . . , 11-n is for executing instructions. In the course of executing the instructions, the processors 11-1 to 11-n generate a number of data access commands. In addition, the instructions themselves are stored as data in the shared memory 14.
Illustratively, each processor 11-1, 11-2, . . . , 11-n is provided with a cache memory 13-1, 13-2, . . . , 13-n. The cache memories 13-1 to 13-n are small high speed memories for maintaining a duplicate copy of particular data in the slower shared memory 14 that the processors 11-1 to 11-n frequently access. Despite their relatively small size, the cache memories 13-1 to 13-n drastically reduce the number of data accesses to the slower shared memory 14. This is because processor data accesses exhibit temporal and spatial locality of reference properties. Temporal locality of reference refers to the tendency of processors to repeatedly access the same data. This occurs, because of program flow control instructions such as loops, branches and subroutines, which tend to cause the processors 11-1 to 11-n to execute the same instructions repeatedly. By maintaining a local copy of the most recently accessed data, the cache memory 13-1 is able to satisfy repeated requests to the same data without accessing the shared memory 14 or using the system bus 16. Spatial locality of reference refers to the tendency of processors to access data stored at addresses near the most recently accessed data. This second phenomena results from the sequential nature of program instruction flow. In order to exploit the spatial locality of reference phenomena, cache memories 13-1 to 13-n maintain a copy of an entire data line containing at least one recently accessed data. Thus, the likelihood increases that the cache memories 13-1 to 13-n may satisfy processor requests for data not previously accessed, provided such data has an address near a previously accessed data.
The cache memories 13-1 to 13-n illustratively work as follows. Suppose a processor, e.g., the processor 11-1, issues a command to access particular data at a particular address. The processor 11-1 first attempts to access the desired data in the cache memory 13-1. The cache memory 13-1 determines if it stores a data line corresponding to the particular data accessed by processor 11-1. (Herein, a particular data line is said to "correspond" to a particular accessed data, if the data line, or its counterpart copy in the shared memory, includes at least part of the accessed data.) If so, a read or write hit (depending on whether the access instruction was a read or write) is said to occur and the data access is satisfied using the cache memory 13-1.
If the data is not present in the cache memory 13-1, a read or write miss is said to occur. In the event of a read or write miss, the cache memory 13-1 issues a read command for reading the corresponding data line from the shared memory 14. The corresponding data line is transferred from the shared memory 14 to the cache memory 13-1 via the system bus 16. Thereafter, the data access is satisfied using the duplicate copy of the data in the cache memory 13-1.
The cache memories 13-1, 13-2, . . . , 13-n must operate in a manner which maintains the consistency of the data in the shared memory 14. In other words, if data in a cache memory, e.g., the cache memory 13-2, is modified, its counterpart in the shared memory 14 must invariably be modified. When a processor 11-1 to 11-n modifies data in its corresponding cache memory 13-1 to 13-n, the cache memories 13-1 to 13-n can immediately update the counterpart data in the shared memory 14. This manner of maintaining the consistency of the data is referred to as "write through".
Advantageously, to further reduce the demands on the system bus 16 and the number of shared memory 14 data accesses, the cache memories 13-1 to 13-n defer updating the stale version of the data in the shared memory 14 until a later time. This manner of maintaining the consistency of the data is referred to as "write back". For example, the cache memory may defer updating the shared memory 14 until the cache memory runs out of storage space.
As shown in FIG. 1, the computer system has one or more I/O bridges 18-1, 18-2, . . . , 18-n which may access the shared memory 14. Each of the I/O bridges 18-1, 18-2, . . . , 18-n may be further connected to one or more I/O devices 22, such as disk drives, Ethernet interfaces, FDDI interfaces, etc., via a corresponding I/O expansion bus 20-1, 20-2, . . . , 20-n.
The purpose of the I/O bridges 18-1 to 18-n is to "decouple" or isolate the I/O expansion busses 20-1 to 20-n from the system bus 16. Typically the system bus 16 has a different data transmission protocol and speed than the I/O expansion busses 20-1 to 20-n and a different data transfer speed. For example, data may be transmitted on the system bus 16 in thirty two byte packets at a speed of 33 MHz. On the other hand, data is illustratively transferred on an I/O expansion bus, e.g., the I/O expansion bus 20-1, in four byte groups at 8 MHz.
Each I/O bridge, e.g., the I/O bridge 18-1, can receive data packets from, e.g., the shared memory 14, via the system bus 16. The I/O bridge 18-1 retrieves the data from the packets and then transfers the "depacketized" data to an I/O device 22 connected thereto via a corresponding I/O expansion bus 20-1. Conversely, each I/O bridge, i.e., the I/O bridge 18-1, can receive data from an I/O device 22 via the associated I/O expansion bus 20-1. The I/O bridge 18-1 organizes this data into packets and then transfers the data, in packets, to the shared memory 14 via the system bus 16.
Devices which access data from other devices (i.e., the processors 11-1 to 11-n, the associated cache memories 13-1 to 13-n, and the I/O bridges 18-1 to 18-n) are referred to as "masters." All masters must access data in a fashion that maintains the consistency of the data in the shared memory 14. For example, a first cache memory 13-1 might attempt to access a data line in the shared memory 14 at the same time that a second cache memory 13-2 contains a modified copy of that data line. In order to maintain the consistency of the data in the shared memory 14, the first cache memory 13-1 must obtain a copy of the data as modified by the second cache memory 13-2, not the stale counterpart copy in the shared memory 14. To that end, the masters utilize an "ownership" protocol for maintaining data consistency. According to an illustrative ownership protocol, before a master can access data, the master must successfully claim ownership in the data line corresponding to the desired data. A master which does not own a data line corresponding to particular data may not access the particular data.
When a first master desires to access particular data, the first master issues a command for claiming ownership in the corresponding data line. (This command may simply be a command to access, i.e., read from or write to a particular line address). The first master then waits a specified period to determine if its claim is successful. Each master "snoops" or monitors the system bus 16 for ownership claiming commands. If a second master currently owns, but has not modified, the data line in which the first master has claimed ownership, the second master illustratively relinquishes ownership in that data line entirely by considering that data invalid. (Alternatively, the second master relinquishes exclusive ownership of that data by considering that data line as shared and by notifying the first master that the data line must be shared with another master.) If the first master does not receive any indication that another master owns the data line during the specified period, the first master successfully claims ownership in the data line and may access the data contained therein.
If the second master detects an ownership claim to a data line which the second master has modified (but not yet written back to the shared memory 16), the second master asserts an intervention acknowledge signal(s) to the first master. This forces the first master to re-issue its ownership claiming command. Meanwhile, the second master updates the copy of the data line in the shared memory 14 by writing back the modified data line to the shared memory 14. Afterwards, the second master relinquishes ownership of that data line. The first master can then successfully claim ownership in the data line.
From the perspective of the I/O bridges 18-1 to 18-n, the ownership claiming process is more complicated. Before an I/O bridge, e.g., the I/O bridge 18-1, can receive data from an I/O device 22 via the respective I/O expansion bus 20-1, destined to particular destination addresses in the shared memory 14, the I/O bridge 18-1 must own the data line corresponding thereto. A conventional I/O bridge only attempts to claim ownership in a data line when absolutely necessary. That is, such an I/O bridge attempts to claim ownership in a data line only upon receiving a request from an I/O device 22 to transfer data corresponding to a data line not currently owned by the I/O bridge. This can unnecessarily slow down the transfer of data on the I/O expansion bus because ownership claiming commands are subject to command and bus arbitration latency on the system bus 16. Furthermore, if another master owns the data line in which the I/O bridge attempts to claim ownership, the I/O bridge must further wait until the other master relinquishes ownership of the data line.
U.S. patent application Ser. No. 08/071,721 entitled "Memory Consistent Pre-Ownership Method and System for Transferring Data Between an I/O Device and a Main Memory" provides a novel solution to this problem. According to this incorporated application, when an I/O device 22 transmits a command to write data into the shared memory 14, the I/O bridge 18-1 issues a command for claiming ownership in this data line as before. However, the I/O bridge 18-1 can also issue a second command for claiming ownership in the very next data line, i.e., having the immediately following or immediately preceding line address, even while receiving data corresponding to a currently owned data line. Stated another way, the I/O bridge 18-1 can preliminarily attempt to claim ownership in a sequence of one or more of the very next data lines before the I/O device 22 indicates that it intends to transfer data corresponding thereto. This is possible because of the predictable, sequential nature of I/O device data access to the shared memory 14. That is, an I/O device 22 typically transfers a block of data (e.g., 1 to 4 k bytes) to sequentially ascending or descending addresses of the shared memory 14. The I/O bridge 18-1 can easily determine (from the first group of consecutive data transfer commands issued by the I/O device 22) whether the data block is to be transferred into sequentially ascending or descending address. Thus, the I/O bridge 18-1 can accordingly "pre-issue" "pre-ownership" claims in the immediately sequentially following or preceding data lines.
The computer system 10 is designed to maximize the throughput, i.e., the flow of data from input to output. This is achieved partly by increasing the processing speed of the processors 11-1 to 11-n. However, the most significant increase in speed lies in the manner of operating and allocating shared resources, such as the shared memory 14 and the system bus 16, to the devices, i.e., processors 11-1 to 11-n, cache memories 13-1 to 13-n, I/O bridges 18-1 to 18-n, etc. For instance, while many of theses devices can process data independently, the system bus 16 is the only means for transferring data between these independently operating devices. Thus, the computer system 10 is designed to operate and allocate the system bus 16 in a fair and orderly manner which increases throughput.
As discussed above, one purpose of the cache memories 13-1 to 13-n is to ensure that the processors 11-1 to 11-n are not idly waiting for data accesses to be satisfied by the slow shared memory 14 (which shared memory 14 operates at a fraction of the speed of the processors 11-1 to 11-n, e.g., 1/10 to 1/20 of the processors' 11-1 to 11-n speed). However, the cache memories 13-1 to 13-n also reduce the number of data transfers on the system bus 16, thereby decreasing the demands to access the bus 16.
The transfer of data and commands on the system bus 16 may be further optimized depending on how both data and commands are transferred on the system bus. FIG. 2 illustrates a timing diagram explaining two data and command transfer methods on the system bus 16 of FIG. 1. The signal "A" illustrates a first manner of transferring data on the system bus 16 called "simple bus". As shown, a master issues a read data command on the system bus 16 at time A.sub.1 destined to the shared memory 14. The master then monitors the system bus 16 for the data returned by the shared memory 14. At a time A.sub.2, the shared memory 14 transmits the requested data to the master. In order to prevent confusion of source and destination devices for each transferred data, no device may transmit during the period between the times A.sub.1 and A.sub.2. Rather, the system bus 16 is idle during this time.
This manner of transferring data and commands on the system bus 16 is disadvantageous. This is because the system bus 16 is idle for substantial time periods between the issuance of read commands and the transfer of data from the shared memory 14.
Signals "B1" and "B2" illustrate another transfer scheme on a split transaction system bus 16 which supports split read transactions. With such a bus, data read from the shared memory 14 may be transferred back to the requesting master via the data bus 16-1 independently of the transfer of commands on the command bus 16-2. For example, signal B1 shows the issuance of a number of split read transaction type read commands CMD1, CMD2, CMD3 on the command bus 16-2. Signal B2 shows the transfer of data DATA1, DATA2, DATA3 on the data bus 16-1 corresponding to the issued split read transactions type read commands CMD1, CMD2, CMD3, respectively. As can be seen, the issuance of read commands is no longer delayed by the return of data corresponding to each issued read command.
In the computer system 10 (FIG. 1), masters are permitted to simultaneously have more than one outstanding, unsatisfied read command for which the shared memory 14 has not yet returned the accessed data. For example, an I/O bridge 18-1 which issues pre-ownership claims as discussed above may issue a number of ownership claiming read commands (i.e., CMD1, CMD2) in consecutive cycles regardless of whether the shared memory 14 has returned data corresponding to any previously issued read commands.
Signal B2 shows the data being returned in the order in which the ownership claiming instructions were issued. Thus, if the I/O bridge 18-1 issued commands CMD1 and CMD2, the first returned data DATA1 would correspond to CMD1 and the second returned data DATA2 would correspond to CMD2. However, this is not always the case in actual operation. Generally, the order of returned data is arbitrary, depending on, among other things, the write back policy obeyed by the masters of the computer system 10. Consider the example where the I/O bridge 18-1 issues CMD1 and CMD2 on consecutive cycles to read two different data lines DATA1 and DATA2, respectively. However, the cache memory 13-1 currently owns, has modified, but has not yet updated, the data line DATA1. In such a case, the cache memory 13-1 illustratively detects CMD1 on the command bus 16-2 and asserts an intervention acknowledge signal. In the meantime, the shared memory 14 retrieves the second data line DATA2 and transfers this data to the I/O bridge 18-1. Subsequently, the cache memory 13-1 writes back the data line DATA1 which is received at the I/O bridge 18-1.
This presents a problem for the masters of the computer system 10 because a master with more than one outstanding read command cannot predict with certainty the order of the returned data. Stated more generally, each master, which can have a number of outstanding, unsatisfied requests to utilize a shared resource of the computer system 10 (FIG. 1), must be able to correlate each utilization or satisfaction with its corresponding request. This is particularly a problem when the requests are satisfied in an arbitrary order. An object of the present invention is to overcome this disadvantage.