It is becoming ever more common for computing devices to be constructed in a heterogeneous fashion, i.e. composed of a system made up of different computational devices, and for those computing devices to be programmed according to data parallel programming models, e.g. single program multiple data (SPMD) models or single instruction multiple thread (SIMT) models.
Implementations of SPMD/SIMT programming models such as Open Computing Language (OpenCL) have therefore been developed to enable programmers to take advantage of the increased processing power provided by such heterogeneous computing systems, whilst presenting the programmer with a programming framework which can be employed across different computing platforms.
Whilst such programming models advantageously present the programmer with a unified, and therefore simplified, programming view, it will be understood that various complexities associated with executing programs written for such heterogeneous computing systems must then be handled by the background systems provided to support them, such as the compiler.
One issue that may need to be handled relates to the multiple threads which may be executed in heterogeneous computing systems programmed in this manner. In particular, it is clearly desirable to avoid redundant processing by each of those threads, where the nature of the operations involved is such that it is not necessary for each individual thread to perform particular operations or maintain individual copies of variables.
Compilers have thus been developed which seek to automatically detect any scalar operations and factor them out of the parallel execution. One aspect of this process is the identification of uniform (also known as invariant) instructions and variables, which can be determined to be invariant across multiple threads. Identification of such uniform instructions/variables can therefore mean that only one copy of the relevant value needs to be kept for all threads, since all threads operate with respect to the same value. This optimisation can not only save memory allocation, but also improve performance by reducing the redundant storage of live variable context.
Examples of the state of the art relating to such optimisation techniques can be found in the following documents:
Yunsup Lee et al., “Convergence and Scalarization for Data-Parallel Architectures”, in Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Feb. 23-27, 2013, Shenzhen, China;
Ralf Karrenberg and Sebastian Hack, “Improving Performance of OpenCL on CPUs”, in Proceedings of the 21st International Conference on Compiler Construction 2012, pp. 1-20;
Wilson Fung et al., “Dynamic warp formation and scheduling for efficient GPU control flow”, in MICRO, Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 407-420, IEEE 2007;
The OpenCL 1.2 Specification, revised on 14 Nov. 2012, available at: http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf; and
Bruno Coutinho et al., “Divergence Analysis and Optimizations”, in Parallel Architectures and Compilation Techniques (PACT), October 2011, pp. 320-329.
However, it has been found that current approaches to the identification of uniform variables tend in some instances to be overly conservative (in that some variables which are in fact uniform are not identified as such).
Accordingly, it would be desirable to provide an improved technique for the identification of uniform variables.