1. Technical Field
Embodiments of the present invention relate generally to data processing system operation and more particularly to a method, system, apparatus, and article of manufacture for performing cacheline polling utilizing store and reserve and load when reservation lost instructions.
2. Description of the Related Art
At the advent of modem computing, information handling (e.g., computer) systems comprised a limited number of components including a single processor, system memory, and a small number of input/output (I/O) devices such as display devices, keyboards, and, in conjunction with the creation of graphical user interfaces, cursor control devices (e.g., mice, trackballs, or the like). As information handling systems have developed however, the number of system components which interface with each other via communication and competition for shared system resources has increased dramatically. Modem, conventional information handling systems may therefore include a wide variety of system components (e.g., multiple processors using SMP, ASMP, NUMA, or similar configurations, co-processors, direct memory access controllers, and I/O devices each of which may include additional processors, registers, and memory).
In order to coordinate the activity of system components in modem information handling systems, a number of techniques have been implemented. Interrupts, coupled with interrupt service routines or handlers may be utilized by information handling system components to communicate and/or to indicate the occurrence of an event. Similarly, memory-mapped I/O and port or “port-mapped” I/O may be utilized to provide communication between system components (e.g., processors and I/O devices).
The coordination of activity among elements of an information handling system is of particular importance in the transfer of data between elements for the purposes of performing input/output (I/O) operations. For example, after an information-handling system processor has deposited data in a buffer intended for handling by an I/O device or another processor in a multiprocessor system, the data providing processor will typically notify the I/O device or data-receiving processor that the transfer of data to the buffer is complete. In a conventional information handling system, such notification is typically performed by writing a specific data value into a memory mapped input/output (MMIO) register within the I/O device or data-receiving processor. After a write operation to an associated MMIO register has been detected, the I/O device or data-receiving processor may retrieve data from the buffer via a direct memory access (DMA).
In some conventional information handling systems the completion of DMA retrieval of data can be detected via MMIO register polling or via interrupts. Neither MMIO register polling nor interrupts is an efficient mechanism for detecting the completion of the DMA however because interrupt overhead is typically too great for relatively small buffers and MMIO register polling inefficiently utilizes bus bandwidth which could otherwise be used for DMA transfers, increasing overall system throughput.
In another conventional technique for detecting the completion of a DMA known as “cacheline polling” a predetermined “busy” indicator data value is written into a cacheable memory location, typically known as a buffer flag or semaphore, prior to notifying an I/O device (e.g., via MMIO) of a buffer's availability. The processor then polls the buffer flag for a predetermined “not busy” indicator data value to detect the completion of a corresponding DMA. Since the data is already modified in the processor's cache, cacheline polling does not generate any additional bus activity. After the completion of (DMA) data retrieval from the buffer, the I/O device or receiving processor writes a “not busy” completion data value to the buffer flag. The new buffer flag value can then be accessed by the data-providing processor via a normal cache coherency protocol during which the “busy”-indicating buffer flag data in cache memory is invalidated or replaced by a new completion value.
From a system standpoint, cacheline polling is an efficient polling mechanism. However, in order to implement cacheline polling the data-providing processor executes a set of “polling” instructions repeatedly until the DMA transfer is complete and the buffer flag value is updated, thus wasting valuable system resources (e.g., processor cycles, bus cycles, electrical power, instruction or thread dispatch slots, or the like).