1. Field of the Invention
The invention relates generally to parallel processing in a multiple processing core environment. More particularly, embodiments of the invention relate to selectively moving threads executing in parallel to improve power efficiency in the multiple processing core environment.
2. Background Art
In computer systems, a processor may have one or more cores. A core may be tasked with running one or more threads. Thus, a multi-core processor may be tasked with running a large number of threads. These threads may run at different speeds. When the threads are related to each other, as they are when they are associated with a parallel application, imbalances in thread execution speed and thus thread completion time may represent power inefficiencies. These inefficiencies may exist, for example, in a single, multi-core processor system or in a multiple processor system (e.g., simultaneous multithreading system (SMT), chip level multithreading system (CMP)) running parallel applications.
If a core is running multiple threads, each of which is handling a portion of a parallel workload, and one of the threads is running slower than the other thread(s), then that thread will likely complete its assigned portion of a parallel workload after the other(s). The other thread(s) having to wait is indicative of wasted energy. For example, if n (n being an integer greater than one) cores exist, but m (m being an integer less than n) cores are idle because they are waiting for another core(s) to complete, then processing power is being wasted by the m cores unnecessarily completing their respective tasks too quickly. In some systems, threads that complete their work ahead of other threads may be put to sleep and thus may not consume power. However, putting a core to sleep and then waking up the core consumes time and energy and introduces computing complexity. In a tera-scale environment, tens or even hundreds of cores in a processor may run highly parallel workloads. In this environment, tens or even hundreds of cores may be waiting for a slow core to complete, multiplying power inefficiency caused by workload imbalances between cores.
As used herein, a “critical” thread is understood to mean a thread which is executing in parallel with another thread and which is expected to cause a processor executing the other thread to wait idly for a completion of a task of the critical thread. Previous technologies to improve power efficiency in a parallel processing environment have included reconfiguring the execution of a thread by a particular processing core—e.g. by manipulating one or more configurable attributes of the critical thread and/or the processing core executing the thread. However, there are limited efficiency improvements to be had by variously reconfiguring an execution of a particular thread which is being executed by a particular processing core.