Energy efficiency is increasingly becoming an important differentiator from mobile phones to datacenters. Customers are willing to pay a premium for longer lasting mobile device experiences but also are anxious to get increasing performance from these same devices. On the other end of the scale, datacenters continue to scale up compute power but face thermal limits for what can be efficiently cooled. In addition, the public is increasingly more conscious of energy usage and environmental impact of energy use. Making efficient use of energy is therefore a higher priority design goal in many types of computing systems.
These technically opposing agendas—delivering more performance but using less power—have resulted in the industry experimenting with heterogeneous designs of “big” compute cores closely coupled with “little” compute cores within a single system or silicon chip, called heterogeneous cores or processing herein. The big cores are designed to offer high performance in a larger power envelope while the little cores are designed to offer lower performance in a smaller power envelope. The conventional wisdom is that an operating system's scheduler will then selectively schedule threads on the big or little cores depending upon the workload(s). During at least some times of the day, the operating system may be able to turn off the big core(s) entirely and rely on the power sipping little cores.
Big and little cores may or may not share the same instruction set or features. For example, little cores may include a reduced instruction set or other differences that involve further decision making by the operating system to schedule processes on a compatible core. One traditional example is a system that includes a central processing unit (CPU) and graphics-processing unit (GPU) and allows the GPU to be used for computing tasks when it is idle or underutilized.
Existing and present solutions depend on modifying the operating system's kernel in order to “enlighten” the operating system to the presence of big and little cores, their respective performance and power characteristics, and which facilities in the system (e.g. CPU performance counters, cache miss/hit counters, bus activity counters, and so on) the operating system can monitor for determining on which core(s) to schedule a particular thread. This approach has several drawbacks: 1) it involves modifying the kernel for all supported operating systems, 2) it requires the modified kernel to understand differences in big/little designs across potentially different architectures (e.g., supporting N different implementations), and 3) it tightly couples the release schedule of the operating system kernel and the underlying computer architecture. Changes to the computer architecture then involve waiting for the next scheduled operating system release (i.e., potentially several years or more) before the kernel can support new cores commercially (or vice versa).