A computer system 10 is depicted in FIG. 1. The computer system 10 includes a CPU or processor 12 and an associated cache memory 14, a main memory 16 and a memory bus 18. The function of each part is discussed in greater detail below.
The memory bus 18 is for transferring data and commands between different devices connected thereto, e.g., the CPU 12 or main memory 16. Illustratively, only one device may write data or commands via the memory bus 18 at one time. The computer system 10 has an elaborate bus arbitration protocol for providing each device an opportunity to write data or commands via the memory bus in the case that more than one device desires to write data or commands on the main memory bus 18 at the same time.
The CPU 12 executes instructions such as arithmetic or logical operations on data, program flow control instructions for ordering the execution of other instructions, and memory access commands. Memory access commands include commands for reading data and for writing data. For example, the CPU 12 can write data to the main memory 16 or cache memory 14 or read data from the main memory 16 or cache memory 14.
The main memory 16 is for storing data. The main memory 16 typically has a memory array of storage locations. Each storage location can store a data word of a fixed length, e.g., eight bit (or byte long) data words, therein for later retrieval. When the CPU 12 writes data to, or reads data from, the main memory 16, the CPU 12 issues a command via the memory bus 18 to access (i.e., write data in or read data from) a particular storage location in the memory array of the main memory 16. Each storage location has a unique address for referring to that particular storage location. Typically, the memory access commands include the address of the particular storage location in which data is to be written or from which data is to be read. In the case of a write, the CPU 12 also transfers the data to be stored in the main memory 16 via the memory bus 18. In the case of a read, the main memory 16 transfers the retrieved data to the CPU 12 via the memory bus 18.
The main memory 16 is typically formed by one or more dynamic random access memory integrated circuits (DRAMs). Such DRAM main memories 16 are relatively inexpensive.
On the other hand, a main memory 16 with DRAMs tends to be slow, utilizing at least approximately 80 nsec to store or retrieve data in a particular location of the memory array therein (i.e., data can be accessed at a 12.5 MHz rate). This is slower than the data transfer rate of the memory bus 18 (which can be 33 MHz) and much slower than the CPU 12 (which can have a 66 MHz clock). Thus, if the CPU 12 must access the main memory 16 periodically, e.g., to obtain program instructions, the CPU 12 will be idle for a substantial portion of the time while the CPU 12 waits to receive data from the main memory 16. This reduces the efficiency and operating speed of the computer 10.
Analysis of a large number of typical programs executed by the CPU 12 has shown that CPU accesses to the main memory 16 tend to be confined to a few localized areas of the main memory 16. This phenomenon is known as the property of locality of reference. The reason for this locality of reference property may be understood if one considers the program flow of a typical computer program. The CPU 12 typically executes a sequence of instructions stored in successively addressed memory locations until a subroutine or loop program flow control instruction is encountered. Such instructions cause the CPU 12 to repeat execution of specific sequences of instructions one or more times. Thus, loops and subroutines tend to localize the references or accesses to memory for fetching instructions.
In addition, memory references to non-instruction data tend to be localized to a lesser degree. Non-instruction data tends to be stored in tables, arrays and frequently accessed variables. Thus, the CPU 12, tends to access repeatedly the data stored in the same localities in memory.
If the most frequently accessed portions of program instruction and other data are placed in a small fast memory, the average access time for retrieving data will approach the speed of the small fast memory. This is because a large fraction of the requests will be for data stored in the small fast memory and only a smaller fraction of the memory access will be to the larger, slower main memory 16. The cache memory 14 is such a small, high speed memory.
Cache memories are typically formed by high speed static random access memory integrated circuits (SRAMs). The cache memory may also be part of the integrated circuit of the CPU 12. Cache memories 14 with SRAMs are relatively more expensive than the main memory 16 with DRAMs. Thus, it is not desirable to replace the entire main memory 16 with SRAMs. Rather, it is advantageous to supplement the main memory 16 with a relatively smaller cache memory 14 which operates as discussed below.
When the CPU 12 needs to read data from or write data to a particular addressed location of the main memory 16, the CPU 12 transmits a read or write command containing the address of the data in the main memory 16 to the cache memory 14. Upon receiving such a command, the cache memory 14 determines (as described in greater detail below) if the particular addressed data is stored therein. If so, a "read hit" or "write hit" is said to occur. The cache memory then simply retrieves the data and transfers it to the CPU 12 (in the case of a read) or stores the new data in an appropriate location in the cache memory 14 (in the case of a write).
If the cache memory 14 does not already contain the addressed data, a "read miss" or "write miss" is said to occur. In the case of a read or write miss, the cache memory issues a read command (including the particular address of the data) to the main memory 16 via the memory bus 18. In response to receiving the read command, the main memory 16 retrieves the data stored therein at the particular address and transfers this retrieved data via the memory bus 18 to the cache memory 14. The cache memory 14 stores the data transferred from the main memory 16 (as described below) and then continues as described above as if the data were already present in the cache memory 14.
It may be appreciated that the cache memory 14 merely stores a copy of the data in the main memory 16. However, the CPU 12 may repeatedly read or modify (i.e., over write) the copy of the data in the cache memory 14 at a much higher speed.
The cache memory 14 must operate in a manner that maintains the consistency of the data in the main memory 16. In other words, if data in the cache memory 14 is modified, its counterpart in the main memory 16 must also invariably be modified.
According to one manner of operating the cache memory 14 called "write through," the cache memory 14 issues a write command to the main memory 16 as soon as possible after the CPU 12 modifies the data in the cache memory 14. Thus, the data in the main memory 16 is contemporaneously updated with its copy in the cache memory 14.
According to another manner of operating the cache memory 14 called "write back," the cache memory 14 does not update the main memory 16 as soon as possible after the CPU 12 modifies the data in the cache memory 14. Rather, the cache memory 14 defers updating the counterpart data in the main memory 16 until a later time. For example, the cache memory 14 may defer updating the main memory 16 until the cache memory runs out of storage space. In such a case, a new datum must be stored in a location of the cache memory 14 currently occupied by an old datum. Before the cache memory 14 over writes the old datum, the cache memory 14 issues a command to the main memory 16 to update the counterpart of the old datum, but only if the old datum had been modified by the CPU 12 while residing in the cache memory 14.
Write back is generally more advantageous than write through. Typically, the CPU 12 modifies the data stored in a particular address several times. Write back defers updating the main memory 16 until the CPU 12 no longer needs the data or until absolutely necessary, thus avoiding many unnecessary updates to the main memory 16.
One may appreciate from the above discussion that the cache memory 14 must keep track of which storage locations therein contain valid data, which contain invalid data (e.g., are empty) and which contain modified data that must be written back to the main memory 16. Furthermore, the cache memory 14 must keep track of the address in the main memory 16 of the counterpart data of each copy of data stored in the cache memory 14. Referring to FIG. 2, one organization of the cache memory 14 for achieving these ends is depicted.
As shown in FIG. 2, several consecutive data line storage locations 31, 32, and 33 of the cache memory 14 each store a line of data. A data line is a fixed length block or sequence of data words. Illustratively, the entire address space of the main memory is organized into contiguous, non-overlapping data lines such that a p.sup.th data line contains the data words residing in addresses p * data line length to (p+1) * data line length -1 of the main memory 16. For example, suppose each data line has a length of thirty-two bytes, and the storage locations 31, 32, and 33 store the 91.sup.st, 0.sup.th, and 1.sup.st data lines, respectively. The data lines in storage locations 31, 32, and 33 contain copies of the bytes stored at the addresses 2912, 2913, . . . , 2943, copies of the bytes stored at the addresses 0, 1, . . . , 31, and copies of the bytes stored at addresses 32, 33, . . . , 63, respectively, in the main memory 16. The number p is also called the line address.
It is advantageous to transfer into the cache memory 14 the entire data line containing an accessed data word in the case of a read or write miss. This is because, by virtue of the locality of reference property, the CPU 12 is likely to access several data words in close proximity to one another, e.g., such as in the same data line. Thus, by moving the entire data line into the cache memory 14, the probability increases that data words subsequently accessed by the CPU 12 will already be in the cache memory 14.
Continuing with the above example, if each data line is thirty-two bytes long, then the cache memory 14 has a capacity for storing a certain number of data lines. For example, a 128K byte cache has a capacity for storing approximately 2-4K data lines which are thirty-two bytes long. As shown, associated with each data line, e.g., the data line stored at the location 31, is a tag field 31-1 for storing the status of the data line (e.g., valid, invalid, or modified), and a line address field 31-2 for storing the line address of the data line.
Initially, the tag field associated with each location in the cache memory stores an invalid status. When the CPU 12 issues a memory read or write command to the cache memory 14, the cache memory 14 compares the line address of the accessed data word to the line address of each data line stored in the cache memory 14 having a valid or modified status. If there are no matches, a read miss or a write miss occurs and the cache memory 14 retrieves the data line containing the accessed data word from the main memory 16 in the above described fashion. When this data line is transferred to the cache memory 14, it is stored in one of the data line storage locations (e.g., the storage location 31) of the cache memory 14. Preferably, the data word is stored in a data line storage location having an invalid status, if such a storage location is available. The line address of the retrieved data line is stored in the appropriate line address field (e.g., the line address field 31-2). In addition, the status stored in the tag field (e.g., the tag field 31-1) corresponding to the data line is changed from invalid to valid.
When the CPU 12 writes a data word to a particular addressed location, the data word received from the CPU 12 is stored in an appropriate location within the corresponding data line storage location in the cache memory 14. For example, suppose the storage location 31 stores the 91.sup.st data line of bytes residing at the addresses 2912, 2913, . . . , 2943. If the CPU 12 writes a data word to the address 2914, this data word is stored in the third byte 31-3 of the storage location 31. In addition, the cache memory 14 changes the status stored in the tag field 31-1 from valid to modified. At a later time, the cache memory 14 may subsequently examine the tag field of a data line to determine if the data line was modified and therefore must be written back to the main memory 16.
In the computer system 10, more than one CPU 12 and cache memory 14 may also be connected to the memory bus 18 and have access to the main memory 16. This presents a problem for maintaining data consistency amongst all cache memories that may access a data line in the main memory 16. For instance, a first cache memory may retrieve a data line from the main memory 16 and modify it. Before the first cache memory writes this modified data line back to the main memory 16, a second cache memory may issue a command to retrieve the same data line from the main memory 16. The second cache memory must obtain the modified data line in the first cache memory and not the stale data line in the main memory 16.
In order to maintain the consistency of the data in the main memory 16 amongst all cache memories, the computer system 10 typically has an elaborate arbitration protocol for "claiming ownership" in data lines. A cache memory which successfully "claims ownership" in a data line has priority to modify the data therein. Otherwise, the cache memory is not permitted to modify a data line.
When a read or write miss occurs in a first cache memory, the first cache memory issues a command for claiming ownership in the data line in the main memory 16 via the memory bus 18. This command may be detected by a second cache memory. Typically, all cache memories which access the main memory 16 "snoop" or monitor the memory bus 18 for such commands issued by other cache memories. If the second cache memory currently stores the data line in which the first cache memory has claimed ownership, the second cache memory may do a number of things. If the status stored in the tag field associated with this data line is a valid status, the second cache memory may simply store an invalid status in the tag field of this data, thereby conceding ownership to the first cache memory. By marking the data line invalid, any subsequent access to this data line by the second cache memory results in a read or write miss. This causes the second cache memory to reclaim ownership of the data line thereby ensuring (as discussed below) that the data line incorporates any modifications made by the first cache memory and its associated CPU.
On the other hand, if the tag field associated with this data line stores a modified status, the second cache memory issues an intervention (delay) command to the first cache memory via the memory bus 18. The second cache memory then writes back the modified data line to the main memory 16 and changes the status stored in the tag field from modified to invalid. Thereafter, the first cache memory can retrieve the data line from the main memory 16. However, by virtue of these steps, the retrieved data line contains any modifications made by the second cache memory and associated CPU.
Cache memories 14 are not the only devices that access the main memory 16 in the computer system 10 of FIG. 1. As shown in FIG. 1, an I/O bridge 22 is provided connected to the memory bus 18. The I/O bridge 22 is also connected to an I/O bus 20. One or more I/O devices 24, 26 or 28 may also be connected to the I/O bus such as an Ethernet interface 24, a FDDI network interface 26 or a hard disk 28.
The purpose of the I/O bridge 22 is to "decouple" or isolate the I/O bus 20 and the memory bus 18 from each other. Typically, these buses have different data transmission protocols and speeds. For instance, data is illustratively transferred on the memory bus 18 in sixteen byte packets at a speed of 33 MHz. On the other hand, data is illustratively transferred on the I/O bus 20 in four byte groups at 8 MHz. The I/O bridge 22 is capable of both receiving and transmitting data in either manner. Thus, the main memory 16 can transfer data packets via the memory bus 18 to the I/O bridge 22. The I/O bridge 22 can thereafter transfer the "depacketized" data to the appropriate destination I/O device 24, 26, or 28 via the I/O bus 20. Likewise, an I/O device 24, 26, or 28 can transfer the data via the I/O bus 20 to the I/O bridge 22. The I/O bridge 22 thereafter transfers this data in packets to the main memory 16 via the memory bus 18.
The writing of data by an I/O device 24, 26, or 28 to the main memory 16 via the I/O bridge 22 must be performed in a fashion that maintains the consistency of the data in the main memory 16 and the cache memories 14. In a conventional computer system 10, this is achieved as follows. When an I/O device 24, 26, or 28 has data to write into a particular location or particular locations of the main memory 16, the I/O device 24, 26, or 28 issues a command to the I/O bridge 22. The I/O bridge 22 then attempts to claim ownership, as described above, in the data line having one or more data words stored at the same address or addresses to which the data of the I/O device 24, 26, or 28 is destined. For example, suppose the I/O device 24, 26 or 28 has a block of 256 bytes to be stored sequentially in the main memory 16 in the storage locations having the addresses 0-255. The I/O device 24, 26 or 28 issues a sequence of data transfer commands to the I/O bridge 22 indicating a desire to transfer data to the addresses 0-255. In response to receiving commands to transfer data to the addresses 0-31, the I/O bridge 22 attempts to claim ownership in the data line stored in the storage locations 0-31 (assuming each data line is thirty-two bytes long). If the I/O bridge 22 is successful in claiming ownership in this data line, the I/O device 24, 26, or 28 transfers the first thirty-two bytes of the 256 byte block to the buffer memory 21 of the I/O bridge 22 via the I/O bus 20. This data may thereafter be transferred from the I/O bridge 22 to the main memory 16 via the memory bus 18.
It is also possible, as discussed above, that a cache memory 14 has modified one of the data lines in which the I/O bridge 22 must claim ownership but has not yet written back the modified data line. In such a case, this cache memory 14 would then instruct the I/O bridge 22 to wait while it writes back the modified data line to the main memory 16.
The I/O device 22, 24 or 26 then issues commands to transfer data to the addresses 32-63. The above steps are then repeated for the data line having data words stored at addresses 32-63, etc. This process is repeated for each successive group of thirty-two bytes of the 256K bytes data block and the data lines residing in the storage locations to which that I/O device data is destined, i.e., residing at addresses 64-95, 96-127, 128-159, 160-191, 192-223 and 224-255.
In this process, the conventional I/O bridge 22 only claims ownership in a data line when the I/O device 24, 26 or 28 is about to transfer a data word destined to an address in which a data word of a data line not currently owned by the I/O bridge 22. That is, the I/O bridge 22 does not claim ownership in the data line stored at the addresses 32-63 until the I/O device 24, 26 or 28 is about to transfer the data word to be stored at the address 32.
This prior art method is disadvantageous. Before data can be transferred to the I/O bridge 22, the I/O bridge must successfully claim ownership in the data line having at least one data word stored at the same addresses in the main memory 16 to which the data of the I/O device 24, 26, or 28 is destined. As discussed above, in order to claim ownership in a data line successfully, the I/O bridge must issue an ownership claim command over the memory bus 18. The ownership claiming latency, which includes the command and bus arbitration latency, slows down the I/O device 24, 26, or 28 to main memory 16 write speed. I/O device 24, 26, or 28 to main memory 16 writes are further slowed down if the I/O bridge 22 must first wait for a cache memory 14 to write back any modifications of the data line in which the I/O bridge 22 has attempted to claim ownership.
It is therefore an object of the present invention to provide a system and method for efficiently transferring data from an I/O device to the main memory which overcomes the disadvantages of the prior art.