The present invention relates generally to the field of computers and in particular to a method and apparatus for conditionally broadcasting memory barrier bus operations.
Computers and other electronic systems and devices perform computational tasks in a wide variety of applications. These systems and devices increasingly integrate functionality beyond straight computation, often by integrating a variety of independent, function-specific circuits or modules, such as processors, mathematical co-processors, video and graphics engines, DMA controllers, GPS receivers, dedicated compression or encryption circuits, and the like. High-bandwidth data transfer between such devices and memory, as well as between the devices themselves, is critical to achieving the desired level of performance. A data communication bus is a well-known structure providing a shared communication link between devices or modules in a processing system.
A common logical dichotomy of devices on a shared bus comprises “master” and “slave” devices. Master devices initiate bus transactions, and commonly arbitrate among themselves for access to the bus, and, in some systems, for a share of the bus bandwidth. Slave devices respond to data transfer bus transactions initiated by master devices, accepting data from the master device in response to write bus transaction and providing data to the master device in response to read bus transaction. Most slave devices execute data transfer operations in the order in which the corresponding bus transactions occur on the shared bus.
In many cases, system performance may be optimized by allowing data transfer operations—such as, for example, memory accesses—to be performed out of order. For example, a sequence of memory operations may be reordered to allow all operations to the same page in memory to be executed before a new page is opened. Processing systems that are allowed to re-order memory operations are generally referred to as “weakly-ordered” processing systems.
Conversely, processing systems that require memory operations to appear to be performed in the same order as their corresponding bus transactions are referred to as “strongly-ordered” processing systems. Note that slave devices in strongly-ordered systems may actually perform memory operations out of bus transaction order, so long as the memory state at any time appears to the processor(s) as if the memory operations had been performed in order. This characteristic is known as “global observability.” Simple slave devices that always execute data transfer operations in the order received are inherently globally observable. Other slave devices that may execute data transfer operations out of order “snoop” the data transfer operation addresses, and execute data transfer operations to the same address in a bus transaction order. These types of slave devices are also globally observable. Slave devices that execute data transfer operations without regard to bus transaction order are not globally observable.
In some cases, even in weakly-ordered processing systems, data transfer operation order must be enforced to ensure correct operation. For example, an application may require a processor to write data to memory before the processor reads from that memory location. Reordering these operations would result in incorrect data being returned in the read operation.
Various conventional techniques have been employed for executing ordered data transfer operations in a weakly-ordered processing system. One technique is simply to delay a particular data transfer bus transaction until all data transfer operations before it are executed. In the previous example, the processor may delay issuing a read request until it receives an indication that guarantees that the write operation data has been written to the memory location. Halting program execution to enforce data transfer operation ordering obviously has a negative effect on performance.
Another technique for executing ordered data transfer operations in a weakly-ordered processing system is to define an execution synchronization bus transaction as part of the bus protocol, also known as a “memory barrier.” A memory barrier is a bus transaction that ensures that all data transfer bus transactions issued by a master device prior to issuance of the memory barrier are executed, or appear to have been executed, before any data transfer bus transaction issued by the master device after the memory barrier. Any memory barrier is a bus transaction that does not involve any data transfer between master and slave devices. A memory barrier operation may be explicitly initiated by a master device. Alternatively, or additionally, a memory barrier operation may be generated by a bus controller in response to a strongly-ordered data transfer operation initiated by a master device. In the previous example, a memory barrier transaction could be issued by the processor before issuing the read bus transaction. The memory barrier would ensure that the write operation (as well as any other previously issued data transfer operation) is executed before the read operation is executed. Memory barriers are described in co-pending U.S. patent application, “Enforcing Strongly-Ordered Requests In A Weakly-Ordered Processing System,” Ser. No. 11/253,307, filed Oct. 19, 2005, and assigned to the assignee of the present application, which is incorporated herein by reference in its entirety.
The memory barrier may be inefficient in processing systems with multiple slave devices. In such systems, to enforce an ordering constraint, a memory barrier transaction must be propagated to every slave device that can be accessed by the master device issuing a strongly-ordered data transfer bus transaction or memory barrier operation. An acknowledgment of the memory barrier must be received from each of the slave devices before the strongly-ordered the data transfer bus transaction, or bus transaction following a memory barrier operation, is issued. Thus, the delay imposed by the memory barrier is determined by the slowest slave device to respond. This may adversely affect performance, particularly where the slower slave devices perform data transfer operations in bus transaction order regardless of the memory barrier.
Co-pending U.S. patent application, “Minimizing Memory Barriers When Enforcing Strongly-Ordered Requests in a Weakly-Ordered Processing System,” Ser. No. 11/254,939, filed Oct. 20, 2005 and assigned to the assignee of the present application, which is incorporated herein by reference in its entirety, discloses a system and method of dynamically minimizing memory barriers. A status register associated with each slave device indicates, on a per-master basis, whether the slave currently has a pending (unexecuted) data transfer operation from each master device from which the slave may receive data transfer bus transactions. If a particular slave device indicates that it does not have any pending data transfer operations from a particular master device, a memory barrier from that master device need not be propagated to that slave device. In this manner, memory barriers are propagated only where necessary to enforce bus transaction ordering. That is, the memory barrier is directed only to slave devices that have pending (previously issued) data transfer operations from the master device requiring a strongly-ordered data transfer bus transaction or memory barrier operation.
Most conventional systems include at least some slave devices that inherently provide global observability. With respect to such slave devices, there is no need to dynamically monitor whether the slave device has pending data transfer operations from a particular master to determine whether or not to direct a memory barrier transaction to the slave device.