Improving the power-performance of a central processing unit (CPU), for example by reducing a power consumption of the CPU or by increasing a throughput of the CPU, or decreasing a latency of the CPU, can be done by adding hardware accelerators and mechanisms that make use of characteristics of the specific workload. For example, workloads with highly predictable values and branches, out-of-order (OOO) execution for code with high number of independent instructions, speculation mechanisms (such as Transactional Memory), memory renaming, etc. Some of these accelerators/mechanisms work all of the time regardless of the workload nature, whereas others require explicit instruction from software—such as Restricted Transactional Memory (RTM) commands.
Applying all of the accelerators at all times causes high power consumption. At the same time, extending the accelerators to include characterization of the given workload increases their complexity. Relying on the software developer/compiler to directly operate the accelerators via explicit instructions may not be always worthwhile due to missing dynamic information of the compiler, backward compatibility with legacy code, and lack of microarchitecture knowledge during development of what are the complete optimization opportunities.