As semiconductor technology continues to inch closer to practical limitations in terms of increases in clock speed, architects are increasingly focusing on parallelism in processor architectures to obtain performance improvements. At the chip level, multiple processor cores are often disposed on the same chip, functioning in much the same manner as separate processor chips, or to some extent, as completely separate computers. In addition, even within cores, parallelism is employed through the use of multiple execution units that are specialized to handle certain types of operations. Hardware-based pipelining is also employed in many instances so that certain operations that may take multiple clock cycles to perform are broken up into stages, enabling other operations to be started prior to completion of earlier operations. Multithreading is also employed to enable multiple instruction streams to be processed in parallel, enabling more overall work to performed in any given clock cycle.
In particular, multithreading generally has the goal to increase utilization of a plurality of hardware threads by parallelizing the execution of those hardware threads. For example, a plurality of hardware threads are often configured for a single process and those plurality of hardware threads may be concurrently executed or time division multiplexed to execute that process. Those plurality of hardware threads are often further disposed across multiple processor cores and are thus usually configured to communicate messages with each other to complete the process.
However, in a multiple processor core system, hardware threads often attempt to communicate with a single hardware thread. As such, performance bottlenecks can occur while that single hardware thread attempts to process the messages it has received from other threads. To that end, one technique for improving the inter-thread communication of hardware threads has been to implement a selection of messages based on message priority. However, this typically results in lower priority messages being shuffled down a queue in lieu of processing higher priority messages. As a typical example, a first hardware thread often generates prioritized messages for a second hardware thread that may itself generate prioritized messages for the first hardware thread. The first and second hardware threads may be configured to execute some portion of a parallelized process, and in some circumstances, the first and second hardware threads may enter a loop in which the second hardware thread ends up processing messages from only the first hardware thread and the first hardware thread ends up processing messages from only the second hardware thread. As such, the first and second hardware threads may block out messages from other hardware threads and/or processes.
In addition to problems that may occur with hardware thread loops, lower priority messages are often eliminated for newer and higher priority messages. As such, the hardware threads that generated and sent those low priority messages may experience even greater performance bottlenecks as a result of priority-based message management. This, in turn, often causes a performance bottleneck in the execution of the parallelized process and/or additional hardware threads.
Therefore, there exists a need for a more efficient way to manage the inter-thread communication of hardware threads to process a workload.