Implementing separate complex computational units to enhance computational capabilities beyond a computer's basic processing element's sole capabilities is a technique commonly implemented in previous and current computer architectures. Dedicated hardware coprocessors are commonly used to increase performance in areas such as floating point calculation, data input/output (I/O) and graphics. Historical implementations of utilizing such coprocessors include the Execute instruction in the International Business Machines (IBM) 360 mainframe, the X87 floating point escape codes in the early versions of the Intel X86 machines and other escape codes commonly implemented in both minicomputers and microprocessors.
Those aforementioned implementations are mostly characterized by the explicit addressing of a function of a coprocessing unit, a synchronous execution path and an implicit pairwise relationship between the main-processor and its coprocessor. A notable exception to this was the IBM 360 mainframe's implementation of channels to interface peripherals to the system. A channel acted as a coprocessor which executed input/output programs in an asynchronous manner and enabled the main processor to address coprocessors (i.e. the channel).
A block diagram of an example prior art system implementing coprocessors is shown in FIG. 1. The computer system, generally referenced 10, comprises central processing unit (CPU) 12, math coprocessor 14, I/O coprocessor 16, bus 18 and random access memory (RAM) 19. CPU 12 executing a program residing in RAM 19 can offload floating point calculations to math coprocessor 14 via system bus 18. I/O processing, such as disk I/O can also be offloaded from CPU 12 to I/O coprocessor 16.
Processor to processor communications is generally handled via hardware interrupts. A hardware interrupt causes the processor to save its state of execution via a context switch, and begin execution of an interrupt handler. An inefficiency of interrupts is that the processor suspends operation for the period of the interrupt. Additional issues currently affecting inter-processor communication is synchronization which is handled by implementing mechanisms such as locks. Implementing locks is both time consuming and causes an increase in traffic on the bus.
The widespread use of multi-core architectures in contemporary processors has raised new issues related to the inter-processor communication. Since the relationship between main processors to coprocessors is many-to-many, some additional functionality has been added to the coprocessor access functions such as isolation of access (accesses do not get intermixed) and a primitive form of serializability to allow isolation.
Some additional features allow coprocessor selection to be performed automatically (to ease the scheduling burden) and asynchronous execution allows for main processors and coprocessors to execute their tasks at different speeds. The architectural structure, however, has remained asymmetrical, with the main processor issuing an instruction to a coprocessor which then executes the instruction, delivers the results and signals termination.