1. Field of the Invention
This invention relates to computer systems and more particularly to store queues that provide temporary storage areas between a processor and a memory or I/O channel. The invention further relates to mechanisms for maintaining data coherency within computer systems.
2. Description of the Relevant Art
A store queue is a temporary data storage mechanism interposed between a processor and a memory or I/O channel. The store queue essentially decouples the processor from the memory or I/O channel by allowing the processor to write data directly into the store queue. Once the processor has completed its write operation, the processor is free to perform other tasks. The store queue itself is responsible for transferring the write data to the designated device of the memory or I/O channel. By decoupling the processor from the memory or I/O channel, adverse effects as a result of a possibly long write latency associated with the device being written may be reduced.
Both memory and I/O channels include a data bus through which data may be transferred to and from the processor. The width of the data bus is the word size of the memory or I/O channel. Within modern computer systems, a typical word size is 32-bits. Each word transferred on the memory or I/O channel has an associated address which indicates where the word is stored or to which I/O device the word is to be transferred.
Many microprocessors including the particularly popular models 80486 and Pentium processors do not always write data having the same word size as the memory or I/O channel through which the data is transferred. For example, a processor may write 8-bit data or 16-bit data during a particular cycle, even though the memory or I/O channel through which the data is written has a width of 32-bits. This type of write operation is referred to as a partial write in that the data being transferred has a smaller bit width than the bit width of the memory or I/O channel. To maintain data coherency and effectuate proper data transfers during such situations, a set of byte enable signals associated with each addressed word are set to indicate the particular bytes being written. Partial writes often occur in a regular increasing or decreasing sequence with respect to the bytes of an addressed word being transferred (i.e., the lower order byte of a designated word may be written first, followed sequentially by the second order byte, then by the third order byte, and so on).
A store queue may also enhance system performance by combining multiple partial writes of the same word address into a single write to the memory or I/O channel. A store queue which implements a technique for combining partial writes is described within U.S. Pat. No. 4,750,154 to Lefsky, et al.
FIGS. 1A and 1B are provided to more clearly illustrate the operation of an exemplary store queue which combines multiple partial writes of a processor into a single write to an I/O channel. FIGS. 1A and 1B are further provided to illustrate several problems and limitations which may be associated with such a store queue.
Referring first to FIG. 1A, a block diagram is shown of a typical computer system 100 including a central processing unit (CPU) 102 coupled via a CPU local bus 104 to a system memory 105 and to a store queue 106. In its illustrated form, store queue 106 provides an interface between CPU local bus 104 and an I/O channel 107 formed in part by a secondary bus 108. An I/O peripheral device 110 is finally shown coupled to secondary bus 108.
For the system of FIG. 1A, CPU local bus 104 as well as secondary bus 108 each include a data bus having a width of 32-bits. Peripheral device 110 is a 32-bit peripheral, and is illustrative of, for example, a disk drive, a printer, or a local area network (PLAN) device, among other things.
Store queue 106 is provided for receiving write data from CPU local bus 104 and for temporarily storing the data until it can be transferred through I/O channel 107 to peripheral device 110 via secondary bus 108. Store queue 106 includes a FIFO (first-in first-out) buffer 112 that temporarily stores the data. FIFO buffer 112 is arranged with a plurality of separately addressable word storage cells 122A-122H, wherein each word storage cell is capable of storing a word of data. For the system of FIG. 1, each word consists of four 8-bit bytes. When microprocessor 102 executes a write cycle to transfer a full word of data (i.e., 32 bits) to peripheral device 110, the word is temporarily stored within one of the word storage cells 122A-122H of FIFO buffer 112 and is, in turn, passed on to peripheral device 110 via I/O channel 107. Store queue 106 thereby allows the microprocessor 102 to quickly execute (and be released from) a write cycle without being negatively impacted by a possibly large write latency which may be associated with peripheral device 110.
Store queue 106 is additionally configured to combine certain multiple partial write cycles of CPU local bus 104 into a single write cycle on secondary bus 108. This is best understood from the following example. Consider the illustrated situation in which microprocessor 102 executes a first partial write cycle to write a first 8-bit byte of data labeled Byte1 to peripheral device 110. For this example, the byte enable signals indicate that this data corresponds to the lower order byte of the addressed word. Thus, when the write cycle is initiated by microprocessor 102, the byte of data Byte1 is stored within a designated one of the word storage cells 122A-122H, such as word storage cell 122D. Since the byte enable signals indicate this is the lower byte of the addressed word, the byte is stored within a lower order byte location Loc1 of word memory cell 122D. If microprocessor 102 thereafter executes another write cycle to write a byte of data Byte2 to the second order byte location of the same word address, this second byte of data is stored within a second byte location Loc2 of word storage cell 122D. Assuming that another sequential write to this word address is not executed by microprocessor 102 (i.e., microprocessor 102 could additionally write a third and a fourth byte of data to byte locations Loc3 and Loc4, respectively, of memory cell 122D), store queue 106 thereafter transfers both Byte1 and Byte2 to peripheral device 110 during a single write cycle on secondary bus 108. By combining the write operations, the overall bandwidth of secondary bus 108 may be increased and the transfer of data into peripheral device 110 may be expedited.
Despite the mentioned advantages, the store queue 106 described with reference to FIG. 1A may be associated with certain data coherency problems or may be associated with certain limitations in performance, as will be described below. First, the combining of several partial writes into a single write to peripheral device 110 may result in a data coherency problem if proper processing of the data by peripheral device 110 is in fact dependent upon the order of data written from microprocessor 102 on a per-byte basis. By combining the partial writes into a single write, the data is no longer ordered with respect to each byte. Thus, such byte order dependencies by peripheral device 110 must be strictly prohibited to avoid data incoherencies. This limits system flexibility.
The performance of store queue 106 may be further limited when consecutive partial writes are executed which would result in an invalid byte enable combination if the partial writes were combined. In systems such as model 80486 based systems, an invalid byte combination is defined as a non-contiguous combination of enabled bytes within a given addressed word. The transfer of data having an invalid byte combination on CPU local bus 104 or secondary bus 108 is prohibited by system definition. Thus, to avoid such an invalid transfer, when a pair of partial writes are consecutively received by store queue 106 that would result in an invalid byte enable combination if combined, the store queue 106 inhibits the combined storage of the invalid byte combination within a common word storage cell 122A-122H and instead stores the bytes within separate word storage cells. Store queue 106 thereafter causes the bytes to be written via separate cycles on secondary bus 108. This concept will be better understood with reference to FIG. 1B.
FIG. 1B illustrates a situation in which microprocessor 102 first executes a partial write cycle that causes Byte1 to be stored within byte location Loc1 of word storage cell 122D and then executes a second partial write cycle that causes Byte2 to be stored within byte location Loc2. It is noted that store queue 106 allows Byte1 and Byte2 to be stored within a common word storage cell since the two bytes are contiguous with respect to that word. If microprocessor 102 thereafter executes a partial write cycle to write a non-contiguous byte of data Byte4 which is associated with the highest order byte of the addressed word, store queue 106 detects the non-continuity and therefore identifies it as an invalid combination. Thus, store queue 106 stores Byte4 within the next word storage cell 122E of FIFO buffer 112, and executes a separate write cycle on bus 108 to transfer Byte4 to peripheral device 110. Unfortunately, this decreases the effective length of the store queue and decreases system performance since multiple partial write cycles must be executed on bus 108.