The present invention relates, in general, to the field of heterogeneous computer systems. More particularly, the present invention relates to a system and method for computational unification of heterogeneous implicit and explicit processing elements.
Often times the unification of various elements can lead to a new element that is superior to what can be accomplished with any number of the primary elements. Such is the case in the field of computation. SRC Computers, LLC, assignee of the present invention, has discovered that the unification of both implicit and explicit processing elements can have many benefits. Unification is not simply the existence of the two processing forms in a single system. It also encompasses aspects of the system including scalability, data movement, interconnect, aggregation and programmability.
Unification refers to, but is not limited to, the generation of a set of one or more related executable programs that are executed on a heterogeneous processor system. This set of related executable programs for a heterogeneous system is generated from the source code of one type of processor. For example, microprocessor source code for a computer application is submitted to the unification process and method that generates unified source code for a heterogeneous system containing both microprocessor and FPGA-based processor elements. Microprocessor compilation tools take the generated unified microprocessor source code and create the microprocessor executable program while the FPGA-based processor compilation tools take the generated unified FPGA-based processor source code and create the FPGA-based processor executable program. Both executable programs are cooperatively executed on the heterogeneous system.
Microprocessor clock rates (and therefore performance) can no longer increase due to the extreme heat generated at the highest clock rates. In order to provide at least the illusion of higher performance, microprocessor manufacturers turned to lowering clock rates and increasing the number of microprocessor cores on a single chip. This has yielded less than a linear execution performance improvement: 2 cores performing at 1.5 times the performance of 1 core, 4 cores performing at 3 times 1 core, and so on.
It is also more difficult to program a multi-core microprocessor than it is to program a single microprocessor. The blind approach of programming each core as if it were a single microprocessor does not perform well, as each core competes for the shared resources on a multi-core chip. Developers must turn to parallel programming using threads, OpenMP and other techniques, none of which are as easy as serially programming a single microprocessor.
In an attempt to improve overall system performance beyond the limit offered by multi-core microprocessors, many developers turned to a performance accelerator co-processor design paradigm. In this design approach, a processor element with good performance characteristics for a portion of an application program is coupled to a microprocessor through some type of existing input/output (I/O) bus interconnect. The microprocessor is in charge of application execution, drives data transfers, and determines when and how the accelerator co-processor works on its portion of the application's data. Examples of these accelerator co-processor elements include graphic processing units (GPUs), field programmable gate arrays (FPGAs) and application specific integrated circuits (ASICs). However, this type of system design rarely yields good overall application performance for two reasons. Firstly, the time consumed moving data between the microprocessor and its accelerator co-processor negates any performance gains in the co-processor, and secondly this type of system design is not scalable as the co-processor elements have to work through the microprocessor in order to cooperate.
These hybrid co-processor systems have the same programming difficulty inherent in multi-core microprocessors as well as additional complexity introduced by the need to program different types of processor elements. Different types of processor elements have different programming models, idiom for efficient code generation and different programming languages.
To avoid the performance limitations in the accelerator co-processor design model, SRC Computers designed a high bandwidth, scalable system interconnect that supports any number and mix of heterogeneous processor elements. Because of the interconnect design, all processor elements regardless of type cooperate as peers (as opposed to the hierarchical co-processor model) executing an application program. The SRC system design greatly improves overall system execution performance well beyond that offered by multi-core microprocessors or accelerator co-processor designs.
However, existing systems enabling processor element peer cooperation have not heretofore been able to achieve system-wide computational unification and current designs exhibited the essentially the same programming complexity inherent in the accelerator co-processor model.