Developing a complex computer software application usually requires specific domain knowledge. Domain experts are focusing on the overall functional correctness of an application, but lacking knowledge about system level optimizations such as parallel programming for multi-core CPUs or using graphics processors for matrix calculations. This lack of knowledge may result in a poor performance of the application or even infeasibility, e.g., extreme memory consumption. Today, computer systems are often made up of several different processing units such as CPUs (central processing unit), GPUs graphics processing unit), FPGAs (field programmable gate arrays) and other dedicated processing units. Such systems are referred to as heterogeneous systems. System experts are needed to fully exploit the performance of such a system. These experts know about the best implementation for a given subroutine of an application program such as sorting or matrix multiplication on a specific system. Often, system experts provide libraries containing efficient implementations of subroutines for a specific system. Such libraries may adapt the underlying algorithm (execution kernel) of a subroutine, depending on current input parameters and system settings, to optimize for specific performance goals such as high throughput or low power consumption. For domain experts to develop their software, they are using such system libraries in order to benefit from system level optimizations. The drawback, however, may be the required use of such libraries: not only does this hinder portability to other systems but—even more important—domain experts still need to be specialists in using such relatively low-level libraries and must know how to apply them efficiently to achieve a performance improvement.
Related published technologies include, e.g., U.S. Pat. No. 8,296,743 B2, which discloses a method for library-based compilation and dispatch to spread computations of a program across heterogeneous cores in a processing system. The source program contains a parallel-programming keyword, such as map-reduce, from a high-level library oriented parallel programming language.
U.S. Pat. No. 7,979,852 B2 discloses a system for automatically generating optimized codes which are operational on a predefined hardware platform. The system includes an analyzing device for defining optimization rules on the basis of performance tests and measures determined on the basis of standout sequences and static and dynamic parameters.
Thus, known solutions are able to dynamically change the execution kernel for a single function during runtime to optimize a given program for a new system setting. However, such systems still require the domain expert to use specific libraries appropriately during development. Furthermore, the focus on adapting only single kernel functions prevents the system from applying higher-level optimizations such as re-ordering or merging of kernels.
However, known technologies perform the optimization on the basis of single instructions of a program code. Hence, there is a need for an optimization targeted at a sequence of several instructions which may run in a heterogeneous computing system environment.