Computer systems include a number of components and elements. Often the components are coupled via a bus or interconnect. Previously, input/output (I/O) devices were coupled together through a conventional multi-drop parallel bus architecture referred to as Peripheral Component Interconnect (PCI). More recently, a new generation of an I/O bus referred to as PCI-Express (PCIe) has been used to facilitate faster interconnection between devices utilizing a serial physical-layer communication protocol.
A PCIE architecture includes a layered protocol to communicate between device. As an example, a physical layer, link layer, and transaction layer form a PCIE protocol stack. The PCIe link is built around dedicated unidirectional pairs of serial point-to-point connections referred to as a lane. A link between devices includes some number of lanes, such as one, two, sixteen, thirty-two, and so-on. The current PCIE specification, base spec 1.1, is available at http://www.psig.com/specifications/pciexpress/.
Currently, PCIe links maintain coherency with respect to processor caches and system memory. For example, a read/write to an I/O device misses a cache, retrieves a referenced element, performs a requested operation, and then immediately evicts the element from the cache. In other words, an I/O write is checked against a processor cache, but the I/O access is not cache coherent. Furthermore, uncacheable MMIO accesses are uncacheable and are also not coherent. Therefore, I/O accesses are expensive for system operation and potentially decrease processing bandwidth.
In addition, when an I/O device operates on shared memory, the device typically acquires a system wide lock, performs operations on the shared memory, and then releases the lock. Acquiring a lock in this manner potentially results in data serialization and expensive delays in association with operating on shared data with multiple processing elements. Often microprocessors provide mechanisms for multiple threads to perform atomic operations to avoid the penalty associated with locks. Yet, currently PCIe does not provide a direct ability to atomically operate on shared data.
Moreover, devices issue transactions in any order, which, in some instances, results in inefficient memory accesses, such as thrashing of pages of memory. For example, a first transaction is issued referencing a first location in a first page of memory, a second transaction referencing a second location in a second page of memory, and a third transaction referencing a third location in the first page of memory. Here, the first page is opened to service the first transaction, the first page is closed, the second page is opened to service the second transaction, the second page is closed, and then the first page has to be re-opened to service the third transaction.
As devices/components become more complex and undertake heavier workloads, power management also becomes an increasing concern. Previously, PCIe compliant devices are capable of entering a plurality of power states. However, the power states include a single active state and a plurality of different levels of an “off” state, i.e. the device consumes different levels of power but is potentially not operable.