Advances in semi-conductor processing and logic design have permitted an increase in the amount of logic that may be present on integrated circuit devices. As a result, computer system configurations have evolved from a single or multiple integrated circuits in a system to multiple cores and multiple logical processors present on individual integrated circuits. In addition, computer systems have evolved to encompass numerous different functions, such as traditional computing systems, media storage systems, entertainment centers, audio playback, video playback, servers, etc.
As a result, the number of input/output devices to be included in computer systems have also grown exponentially. Often, to support functions that may provide too much of a load for processors in the computer system or are targeted at providing functions that a processor architecture is not fundamentally designed for, an accelerator device may be included in the computer system. The most common example of an accelerator is a graphics accelerator, which provides processing power to perform graphic and display computations. However, an accelerator may include any logic to aid a processor in execution. Other examples may include, a math accelerator, a matrix inversion accelerator, a video compression accelerator, a memory access accelerator, and a network accelerator.
Yet, when a single accelerator is included in a system, that specific accelerator is limited to its default intended use. Furthermore, these accelerators are often located “below” a chipset, i.e. off of an memory controller hub or interconnect controller hub through an Input/Output (I/O) bus, such as Peripheral Component Interconnect (PCI) or PCI Express. As a result, these accelerators are commonly initialized through predefined I/O bus protocols and initialization procedures. However, memory access latencies are much longer for a device sitting off an I/O bus as compared to a processor in socket.