Computer technology continues to advance at a remarkable pace, with numerous improvements being made to the performance of both processors—the “brains” of a computer—and the memory that stores the information processed by a computer. In general, a processor operates by executing a sequence of instructions that form a computer program. The instructions are typically stored in a memory system having a plurality of storage locations identified by unique memory addresses. The memory addresses collectively define a “memory address space,” representing the addressable range of memory addresses that can be accessed by a processor.
A number of computer designs utilize multiple processors operating in parallel with one another to increase overall computing performance. In a symmetric multiprocessing (SMP) environment, for example, multiple processors share at least a portion of the same memory system to permit the processors to work together to perform more complex tasks. The multiple processors are typically coupled to one another and to the shared memory by a shared bus, often referred to as a system or processor bus, or other like interconnection network.
Many shared memories use multiple levels and arrangements of memory sources to increase system performance in a cost-effective manner. A shared memory, for example, may utilize a relatively large, slow and inexpensive mass storage system such as a hard disk drive or other external storage device, an intermediate main memory that uses dynamic random access memory devices (DRAM's) or other volatile memory storage devices, and one or more high speed, limited capacity cache memories, or caches, implemented with static random access memory devices (SRAM's) or the like. One or more memory controllers are then used to swap the information from segments of memory addresses, often known as “cache lines”, between the various memory levels to attempt to maximize the frequency that memory addresses requested by a memory requester such as a processor are stored in the fastest cache memory accessible by that requester. In a typical SMP environment, for example, each processor may have one or more dedicated cache memories that are accessible only by that processor (e.g., level one (L1) data and/or instruction caches, and/or a level two (L2) cache), as well as one or more levels of caches and other memories that are shared with other processors in the computer. Processors may also incorporate multiple processing cores and/or be disposed in modules housing multiple processors, and/or may rely upon a separate cache or memory controller to issue memory access requests to lower level memories.
In many designs, a chipset comprising one or more integrated circuit chips interfaces the processors with lower levels of memory and/or an input/output subsystem. Among the various features supported by a chipset is that of scheduling operations or transactions on the system or processor bus. Specifically, whenever a processor or other memory requester core requires access to a particular cache line that is not locally cached, a memory access request or transaction is initiated on the system or processor bus to retrieve the requested data. Typically, the request is forwarded to a memory subsystem to retrieve the requested data from the main memory. In some instances, however, each processor “snoops” the requests issued by other processors on the bus, and may result in another processor initiating a cache-to-cache transfer to fulfill to a memory access request if the other processor has a locally-cached copy of the requested data, which may avert the need to retrieve the data from the main memory altogether. Such systems also typically support the ability for a processor to “cast out” modified data stored in a local cache to update the copy of the data in the main memory, e.g., when the local cache for the processor is full and space is required for other data needed by the processor.
To maximize performance over a shared bus such as a system or processor bus, many conventional designs utilize separate data paths for data and for address and control information. One common architecture, for example, utilizes a data bus for handling data traffic and an address/command bus for handling address and command traffic. Transactions are initiated by communicating a command over the address/command bus, with the command typically including the address of a particular cache line being affected by the command.
Moreover, many designs utilize split transactions, whereby requests for data and responses to those requests are treated as separate operations. A principal benefit of split transactions, particularly with regard to requests to retrieve data from a lower level memory, is that a shared bus is permitted to handle other operations while waiting for the requested data to be returned from the lower level memory.
It has been found that the scheduling data transfers on a data bus can have a significant impact on system performance in a shared bus architecture incorporating split transactions. In many designs, the principal types of data that may be transferred over a data bus can be separated into two groups referred to respectively as in-order and out-of-order data transfers. “In-order” data transfers are principally associated with write operations and cache-to-cache transfer operations (which may also be referred to as explicit and implicit write back operations, respectively). These data transfers are considered “in-order” as the relevant data is typically capable of being transferred over the data bus immediately after the operations are initiated over an address/command bus.
A write operation, which may also be referred to as a cast out operation, is typically initiated by a processor to update a copy of a cache line stored in main memory as a result of the processor needing to discard its own locally cached copy of that cache line when that copy has been modified by the processor. A cache-to-cache transfer operation, which may also be referred to as a local intervention operation, is typically initiated by a processor when that processor detects a read operation initiated by another processor that is directed to a cache line that is locally cached by that processor. As noted above, whenever such a read operation is detected, the processor having the local copy of the cache line “intervenes” in the request, and returns the requested data, thereby averting the comparatively slower access to the main memory.
“Out-of order” data transfers are principally associated with read operations that are directed to cache lines that are not locally cached in any other processor. These read operations are sometimes referred to as “deferred” read operations, as the responses to these operations must be deferred for some indeterminate amount of time until the requested data can be retrieved from main memory. Often the amount of time required to retrieve the requested data cannot be ascertained, and may vary widely, e.g., depending upon whether the requested data is currently stored in the main memory, or must first be paged into the main memory from mass storage. These data transfers are termed “out-of-order” since the read operations upon which they are responsive are deferred while other operations (which may have been issued after such read operations) are processed on the shared bus.
As noted above, the manner in which data transfers are transmitted over a data bus can have a significant impact on system performance. In general, it is desirable to maximize data bus utilization to maximize data throughput and minimize the amount of time that the data bus is idle. It is also desirable, however, to minimize the latency, or delay, required to complete transactions initiated on a shared bus. With respect to read transactions, for example, the latency may be looked at from multiple standpoints, e.g., from the standpoint of average latency, from the standpoint of maximum latency, or from the standpoint of latency distribution. Data bus scheduling may also have an impact on the utilization of the address/command bus, as well as on other facilities in a system.
Conventional data bus scheduling algorithms have typically favored in-order data transfers such as those associated with write operations and cache-to-cache transfer operations, granting priority to any in-order data transfer over out-of-order data transfers when both types of data transfers are awaiting transfer over the data bus at the same time. Some algorithms also attempt to provide balance between both types of data transfers, e.g., by favoring in-order data transfers over out-of-order data transfers unless the number of pending out-of-order data transfers exceeds a threshold.
To date, however, conventional data bus scheduling algorithms have not provided optimal system performance. As such, a need continues to exist for a data bus scheduling algorithm providing improved system performance.