As computer systems advance, the input/output (I/O) capabilities of computers become more demanding. A typical computer system has a number of I/O devices, such as network interface controllers, universal serial bus controllers, video controllers, PCI devices, and PCI express devices, that facilitate communication between users, computers, and networks. Yet, to support the plethora of operating environments that I/O devices are required to function in, developers often create software device drivers to provide specific support for each I/O device.
During execution of a device driver it is common to update device registers present on the I/O devices. Current computer systems typically map device registers in I/O devices into uncacheable (UC) memory pages. When a device driver needs to write to a device register, a microprocessor executing the device driver usually performs a write operation to a UC memory address. As an example, when writing to a specific device register, the microprocessor may write to an offset within a page of the UC memory that corresponds to the specific device register.
However, using a UC memory structure may cause serialization of a microprocessor pipeline. During a UC write, the processor pipeline is potentially forced to stall until the write operations is complete. As a specific example, during packet reception, a network interface controller (NIC) device driver may write to as many as four device registers per packet; the device drivers including the direct memory access (DMA) engine doorbell, the NIC Rx tail update, the interrupt enable, and the Tx tail update device registers. Assuming a typical UC write incurs a pipeline stall of approximately 200 ms, the aforementioned four UC writes may incur as much as a 800 ms stall in the processor pipeline, which adversely affects processor performance.