This invention relates to efficient utilisation of processor resources.
Processor designers are continually striving to improve processor performance, designing processor architectures that provide, for example, increased computational abilities, increased operating speeds, reduced power consumption, and/or reduced cost. With many previous processor architectures, it has become increasingly difficult to improve processor performance by increasing their operating frequency. As a result, many newer processor architectures have focused on parallel processing to improve performance.
One parallel processing technique employed in processor architectures is to provide multiple processing cores. This technique utilises multiple independent processors, referred to as cores, operating in parallel to execute programs. Typically, multiple processing cores share a common interface and may share other peripheral resources. Each core may have further parallel processing capabilities. For example, a core may have multithreading, superscalar, single instruction multiple data (SIMD), multiple instruction multiple data (MIMD), very long instruction word (VLIW) and/or other parallel processing capabilities.
Each core can be provided with different types of processing elements that are configured to process data in different ways. For example, a core may comprise a set of main processing elements that are optimised for processing scalar operations and another set of auxiliary processing elements (e.g. a SIMD pipeline) that are optimised for vector operations.
In SIMD processing, multiple processing elements execute the same instructions at the same time but with different data. Thus SIMD processing is particularly efficient for routines that repeat identical sequences of code multiple times for different datasets. Such routines may be, for example, for adjusting the contrast in a digital image.
A computer can comprise a number of different types of processors (e.g. CPUs, GPUs, DSPs, FGPAs, etc.) with parallel processing capabilities. Frameworks, such as OpenCL and CUDA, allow programmers to write programs that are capable of being executed across the different types of processor. In some situations, under the structure provided by the framework, the processing capacity can be underutilised when executing programs. Thus, there is a need for better management of available processing capacity so that the programs can be more efficiently processed.