The present invention relates generally to facilitating efficient use of the heterogeneous resources of a heterogeneous computer system.
Today, many computer systems are heterogeneous computer systems in the sense that the computer systems might comprise multiple processing elements that have different architectures, such as different hardware architectures. The different architectures are typically chosen to optimize the design of the processing element for a particular subset of tasks in order to reduce the throughput times of such tasks performed by the computer system. One example of a heterogeneous computer system may include a central processing unit (CPU) and a graphics processing unit (GPU), although other types of heterogeneous computer systems are also known.
In order to optimize the performance of a heterogeneous computer system, it is desirable to ensure that during operation of the computer system, the resources of the computer system are effectively utilized. For example, different types of computer program code may be most effectively executed, e.g. in terms of throughput times, on different types of processing elements, i.e. processing elements having different architectures. For example, computer program code that can be executed with a high degree of parallelism while requiring regular I/O, e.g. memory read/write operations during execution is typically ideally suited for execution on a GPU, whereas computer program code requiring cache exploitation and/or exhibiting many conditional expressions such as branch instructions may be better suited for execution on a CPU, as large parts of the GPU architecture cannot be sufficiently utilized or not utilized at all when executing such a computer program code.
It is therefore desirable to provide some strategy for deciding which processing element of the heterogeneous computer system will be responsible for executing a particular computer program code module, e.g. a software module, to ensure (near-)optimal performance of the computer system.
Jean-Francois Dollinger et al. in “CPU+GPU Load Balance Guided by Execution Time Prediction” as published in the Proceedings of the Fifth International Workshop on Polyhedral Compilation Techniques (IMPACT 2015), disclose a method to jointly use the CPU and GPU to execute a balanced parallel code automatically generated using polyhedral tools. To evenly distribute the load, the system is guided by predictions of loop nest execution times. This approach seeks to optimize the utilization of the CPU and GPU as a function of throughput time. However, such an approach is not guaranteed to minimize throughput time of a particular module of computer program code due to the fact that at least part of the code may be executed on a processing element having inferior throughput characteristics for that particular code. Moreover, the success of the method is heavily reliant on the accuracy of the predictions of the loop nest execution times, which predictions may not always be accurate, which is likely to negatively affect the throughput times of the computer program code module executed on such a heterogeneous hardware architecture.