In computer systems, a processor may have one or more cores. A core may be tasked with running one or more threads. Thus, a multi-core processor may be tasked with running a large number of threads. These threads may run at different speeds. When the threads are related to each other, as they are when they are associated with a parallel application, imbalances in thread execution speed and thus thread completion time may lead to power inefficiencies. These issues may exist in single-core systems and/or in multi-core systems (e.g., simultaneous multithreading system (SMT), chip level multithreading system (CMP)) running parallel applications.
Consider the following situation. If a core is running multiple threads, each of which is handling a portion of a parallel workload, and one of the threads is running slower than the other thread(s), then that thread will likely complete its assigned portion of a parallel workload after the other(s). The other thread(s) may waste energy while waiting for the slower thread to complete. If n (n being an integer greater than one) cores exist, but m (m being an integer less than n) cores are idle because they are waiting for another core(s) to complete, power is being wasted by the waiting cores. In some systems, threads that complete their work ahead of other threads may be put to sleep and thus may not consume power. However, putting a core to sleep and then waking up the core consumes time and energy and introduces computing complexity. In a tera-scale environment, tens or even hundreds of cores in a processor may run highly parallel workloads. In this environment, tens or even hundreds of cores may be waiting for a slow core to complete, multiplying power inefficiency caused by workload imbalances between cores.