Traditional Integration of Co-Processors
As semiconductor manufacturing processes are reaching an era that approaches 1 trillion transistors per die, design engineers are presented with the issue of how to most effectively put to use all the available transistors. One design approach is to implement specific computation intensive functions with dedicated hardware “acceleration” on die along with one or more general purpose CPU cores.
Acceleration is achieved with dedicated logic blocks designed to perform specific computation intensive functions. Migrating intensive computations to such dedicated logic blocks frees the general purpose CPU core(s) from executing significant numbers of instructions thereby increasing the effectiveness and efficiency of the CPU core(s).
Although “acceleration” in the form of co-processors (such as graphics co-processors) is known in the art, such traditional co-processors are viewed by the OS as a separate “device” (within a larger computing system) that is external to the CPU core(s) that the operating system (OS) runs on. These co-processors are therefore accessed through special device driver software and do not operate out of the same virtual memory space as a CPU core. As such, traditional co-processors do not share or contemplate the virtual addressing-to-physical address translation scheme implemented on a general purpose CPU core.
Moreover, large latencies are encountered when a task is offloaded by an OS to a traditional co-processor. Specifically, as a CPU and a traditional co-processor essentially correspond to separate, isolated sub-systems, significant communication resources are expended when tasks defined in an application running on a CPU core are passed from the application through the OS “kernel” to the driver which manages the co-processor. Such large latencies favor system designs that invoke relatively infrequent tasks on the co-processor from the main OS but with large associated blocks of data per task. In effect, traditional co-processors are primarily utilized in a coarse grain fashion rather than a fine grain fashion.
As current system designers are interested in introducing more acceleration into computing systems with finer grained usages, a new paradigm for integrating acceleration in computing systems is emerging.