This application relates in general to multi-processor computer systems and in particular to a coherency ordering queue for memory operations, including write operations.
In prior art systems, a response queue is used by a processor to hold information that is being sent to memory via a memory controller, or to other processors after passing through the memory. Typically, each processor in a multi-processor system has its own associated response queue. In order to allow the response queue to properly function, a certain set structure is imposed on the ordering of the information in the queues. However, this structure can limit the flexibility of the system, and the nature of multi-processor systems is such that different types of write operations may be desirable in a response queue. However, providing different types of write operations would entail increasing the size of the header queue, which is the portion of the system which keeps track of what is in the response queue. The structure of a response queue is such that a queue can generally hold a number of write operations of data (xe2x80x9cwritesxe2x80x9d), e.g. in 16 slots. Because the write operation necessarily occupies a number of slots (e.g five slots) at a time, there are a limited number of operations that can be stored in the queue, for example, three writes of five slots (e.g. four slots for data, and one for the address), and one return short (i.e., a read to a processor which utilizes a register rather than cache, or memory space, and thus uses one slot). Thus, all this structure yields a certain amount of systematic rigidity which might preclude the use of different types of transactions.
Some multiprocessor systems have coherent memory operations, which are operations that are sent to or from a processor which will operate on memory and keep the processor caches in the system consistent with each other and the memory. Coherency operations requires that the processor be able to send/receive coherency messages to/from the memory controller. These messages are stored in a coherency queue that is different from the response queue. Coherency messages include coherency-shared (cache has data shared), coherencycopy-out (cache will supply the data in a copy-out operation) or coherency-ok (cache check done, neither shared nor copy-out). To maintain coherency, these systems will use a coherency order queue to maintain the order of the responses in the response queue and the coherency messages in the coherency queue. Note that the coherency queue may be merged into the coherency order queue, since the messages may be 1 or 2 bits in size. Further note that since a coherency message can be sent out substantially simultaneously as a write response, then the coherency ordering queue must track these entries separately. Thus, the coherency ordering queue would be able to record that at time X both a response and coherency message was sent out, while at time X+1 only a response was sent out, and at time X+2 only a coherency message was sent out. As write responses are placed into the response queue, markers are placed into the coherency ordering queue, and as write responses are sent out of the response queue, their associated markers are cleared from the coherency ordering queue.
The coherency signal message coherency-ok is a signal that the processor associated with the queue that has checked the ownership of a particular memory location. Since the response queue may have a prior (or earlier in time) write that involves the same memory location as a subsequent (or later in time) coherency signal stored in the coherency queue, then all prior writes must be cleared before a coherency signal is cleared. Thus, subsequent writes can pass (or be cleared before) prior coherency signals, but subsequent coherency signals cannot pass prior writes.
As subsequent writes are cleared before prior coherency signals, holes in the coherency ordering queue may be created. With the example above, suppose the X+1 write response has been cleared. Since the coherency register was previously empty, then the queue would contain a blank entry at the time slot, as now both the write and coherency registers at that slot are now empty. Such holes create great inefficiencies in queue usage, and possibly may result in queue spillage. For example, suppose the first (oldest) and last slots have coherency messages while the middle slots have write responses. Suppose all of the write responses are cleared. Thus, only the first and last slots have messages, while the remainder of the queue is empty. However, the queue is fill, as additional messages must be added to the end of the queue, which is occupied by a coherency message. Thus, additional messages cannot be added to queue.
To eliminate such holes, the queue is searched after each write has been cleared, and upon finding a hole, the queue is collapsed. This entails shifting the contents of the queue down by one to fill in the hole. Note that the entire contents of the queue is not shifted, but rather only the portion that is upstream from the hole is shifted. This partial shift is known as a collapse. The problem with performing collapses is that the logic required to perform the collapse is both complex and expensive.
Furthermore, the rigidity of queues, when combined with the necessary operation rules, precludes efficient use of searching of the queue unless the collapsing function is used. Also, as the number of entries that the response queue can store increases, the coherent ordering queue also increases in the number of entries (or queue depth) that it can store. Note that queue width is the size of each entry or the number of bits for each entry. Thus, any increase in queue size also results in an increase in the queue search time, more entries have to be searched in order to find the next write for clearing.
It is therefore, desirable to have a system that makes the use of different processors and variable write operations feasible.
It is therefore further desirable to have a system that allows for the efficient searching and collapsing of queues.
These and other objects, features and technical advantages are achieved by a system and method which provides for a more compact ordering queue by reducing the queue depth and expanding the queue width. Under the design contemplated, the reduced queue depth allows for a quicker search of the queue and for an expanded range of write operations as might be needed in multi-node systems where upgraded processors utilize different types of write operations.
Essentially, the processor agent chip or PAC in a multi-node system is capable of processing request packets and response packets from multiple processors. In doing so, the chip generally utilizes at least a tracker system for tracking coherent request packets sent to the processor, a coherent ordering queue for maintaining order between the response packets (both coherent and write), a response queue which stores a response packet, and a header queue for identifying the contents of the response queue.
The coherency queue in the prior art is structured such that it has a large depth which impedes searching. For example, the depth of the coherency queue in the prior art is 9 slots, because it involves 5 COH operands from the tracker, and 4 response headers (WBs) from the header queue. Each WB, or writeback, operation is tracked with a marker or operand composed of 1 bit, while each COH, or coherency operand is composed of 2 bits. Thus, the overall size in the prior art is a 9xc3x973 ordering queue. Furthermore, it is the applicable rule set, that COHs cannot pass WB, but WBs can pass COHs which wait for a copyout, that slows down the collapsing of the queue as the procedure where WBs pass COHs can produce idle xe2x80x9cholesxe2x80x9d in the queue. Thus, the prior art structure limits the number of operations which can be utilized, and the system offers a cumbersome sized queue and slow logic by which to search and collapse the queue during processing. As contemplated by the present invention, a compact, more versatile queue design is provided for overcoming these limitations of the prior art ordering queues. More specifically, the present invention overcomes the limitations by managing the ordering queue from the coherency bus, and incrementing or decrementing a count of write operations instead of changing a bit from 1 to 0.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.