The processing associated with an Input/Output (I/O) operation in an operating system (OS) in a computer system may logically be divided into two parts. A first part may include preparation and dispatch of a device level I/O request to a hardware I/O device in response to a read or write request from an application. A second part may include receiving and processing a response from the hardware I/O device and returning a completion indication to the application. The first part may be termed “request processing”, while the second part may be termed “response processing”.
The processing time in an OS usually depends on a number of OS internal layers the IO request has to pass through. The OS internal layers are referred to as I/O stack layers. For example, a typical I/O request may traverse through several internal layers of an OS during both request processing and response processing. Various data structures may be accessed at each layer. In addition, as the I/O request flows through the I/O stack layers, each layer maintains book keeping of data structures, referred to as “metadata”, for tracking the I/O request. Once the I/O request is serviced by the I/O devices, these I/O stack layers perform the completion processing and clean-up/update state of the I/O request in their metadata, before notifying the requesting process about the completion of the I/O request.
Generally, while processing the I/O request, the I/O stack layers do not reach the actual data buffer, but focus significantly on processing the metadata maintained by each layer for tracking the I/O request. For example, when a process makes an I/O request on a first processor, the associated I/O stack layers process the request on that processor and issue the request to an associated device adapter from that processor. However, the device adapter may be configured to interrupt a second processor on completing the I/O request, which can result in the I/O stack layers accessing their metadata from the second processor. As the I/O request originated from the first processor, the second processor may generate a fair amount of “cache-coherency traffic” on a central bus to bring the metadata from the cache associated with the first processor to the second processor. This can not only result in requiring more processor cycles for both the request processing and response processing, but also can affect the overall system performance due to additional traffic created on the central bus.
To avoid such additional cache-coherency traffic, the interrupt process may be bound to a processor where the I/O request originated. However, this can create significant load imbalance on the system by binding many processes to a specific processor to which the device adapter's interrupts are bound. Further, the interrupt process needs to be migrated to another processor when it starts performing I/O to a device whose interrupts are bound to that processor, which can result in additional overheads associated with the process movements across the processors.
Current techniques use I/O forwarding to minimize such cache-coherency overhead. The I/O requests for a device are forwarded to the processor that is configured to be interrupted by the device when the I/O completes. The I/O forwarding is generally performed at a device driver level in the I/O stack layers, as the device adapter through which the I/O request would be issued is likely to be known at this I/O stack layer. This ensures the I/O request initiation and completion code paths in the device driver and interface driver components of the I/O stack are executed on the same processor. Further, the I/O forwarding ensures that the metadata of these I/O stack layers are accessed from the same processor to which the interrupt is bound. Even though the I/O forwarding improves metadata locality on the I/O stack layers by forwarding the I/O request to a processor to which the interrupt is bound, it does not exploit recent developments in server such as, cell locality in non-uniform memory access (NUMA) architecture and I/O technologies including multi-interrupt capable device adapters that are capable of interrupting multiple processors.
Other features of the present embodiments will be apparent from the accompanying drawings and from the detailed description that follows.