A computing architecture referred to as “big.LITTLE” has recently been introduced by ARM® Holdings, with its head office in Cambridge, England. In one example of a big.LITTLE system, a “big,” i.e., higher performance and power consuming Cortex-A15 processor is paired with a “LITTLE,” i.e., lower performance and power consuming Cortex-A7 processor. The system switches back and forth between executing a thread on the two processors based on the computational intensity of the thread. If the thread is computationally intensive, execution is switched to the Cortex-A15 processor, whereas when the thread is not computationally intensive, execution is switched to the Cortex-A7 processor. By doing so, the goal is achieve near the performance of the Cortex-A15 processor while consuming power somewhere between the typical power consumption of the respective Cortex-A7 and Cortex-A15 processors. This is particularly desirable in battery-powered platforms that demand a wide range of performance, such as smart phones.
An ARM white paper by Peter Greenhalgh entitled, “Big.LITTLE Processing with ARM Cortex™-A15 & Cortex-A7,” published in September 2011, states that the Cortex-A15 and Cortex-A7 processors are architecturally identical and indicates this is an important paradigm of big.LITTLE. More specifically, both processors fully implement the ARM v7A architecture. (For example, the Cortex-A7 implements the Virtualization and Large Physical Address Extensions of the ARM v7A architecture.) Consequently, both processors can execute all instructions of the architecture, although a given instruction may execute with different performance and power consumption on the two processors. The operating system decides when to switch between the two processors to try to match the performance required by the currently executing application.
One limitation of the big.LITTLE approach is that it requires full architectural compatibility between the two processors. This may be significant, particularly when the architecture includes instructions that necessitate a significant number of transistors. For example, even the minimum hardware required to implement single instruction multiple data (SIMD) instructions may be considerable, even if, for example, the LITTLE processor includes simplified hardware that serializes the processing of individual elements of data within a SIMD instruction. Generally, the appearance of these instructions in an application highly correlates to the need for high performance by the application. Consequently, it is unlikely the simplified SIMD hardware in the LITTLE processor will be used for any significant time since it likely will quickly fail to meet the performance requirements of the application and a switch to the big processor will occur. Thus, the simplified implementation of the SIMD hardware in the LITTLE processor will be wasted.
Another limitation of the big.LITTLE approach is that it may require changes to the operating system to make decisions about switching between the processors and coordinating the switches. It may be difficult to persuade the developer of the operating system to include such specialized code tailored to a particular implementation, particularly a proprietary operating system developer.
Another drawback of the big.LITTLE approach is that the portion of the operating system that determines when to switch between big and little is consuming bandwidth on the currently running processor and taking bandwidth away from the application. That is, the switch code is not running in parallel to the application, it is running instead of the application.
Another drawback of the big.LITTLE approach is it appears there are some applications for which it is very difficult to develop effective switch code. That is, it is difficult for the operating system to know when to make switches in a manner that does not either consume significantly more power than necessary (i.e., run big too long) or provide poor performance (i.e., run LITTLE too long).