As both the extent and complexity of computer processing have grown even in everyday modern life, there is a well-known, ever-increasing need for greater processing power. In many cases, even the increases in processing capability predicted by Moore's Law is insufficient.
One way to increase processing capacity is to distribute the load. “Cloud computing” is one known distribution scheme, in which local systems access shared processing resources such as servers remotely via a network, usually on demand. Although making essentially unlimited resources available, network delays alone preclude use of such an arrangement for many computationally intensive, time-critical or synchronized tasks.
One approach to handling some such tasks is “parallel computing”, in which a task is decomposed into discrete sub-tasks that can be performed simultaneously by different processing systems. Certain processing tasks involve operations that can be performed by a system's main processor, but that are so specialized that an auxiliary processor—a coprocessor—may instead be more efficient, thereby leaving the more general operations to the main processor. Coprocessors are thus frequently included in systems to perform such operations as floating point arithmetic, encryption, string processing, I/O interfacing, and signal and graphics processing. Such coprocessors may be locally and/or remotely attached.
The specialization of coprocessors offers many obvious advantages—they are, after all, designed to perform certain tasks especially well—but they also create challenges, especially when a main hardware platform is to be able to access more than one coprocessor, which may have different API protocols, may be distributed, that is, with some or all of them remote, may have unbalanced loading, etc.
Heterogeneous and “exotic” hardware systems that leverage the specialized capabilities of coprocessors promise much higher performance and efficiency for compute-intensive applications for which they are targeted. However, it has in many cases proven difficult to “scale-up” or simultaneously use more than a single coprocessor to increase efficiency and performance and accelerate applications further, especially (but not exclusively) where portability across different vendors and system configurations is necessary or desirable. These difficulties are often a barrier to adopting additional hardware since the software enablement effort is increasingly high.
What is needed is therefore a system and operational method that makes the use of one or more coprocessors more generally feasible and useful.