The present invention relates to multi-threaded processing, and more specifically, to systems, methods and computer program products for instruction balancing within a multi-threaded processor through instruction uncertainty.
There are many goals in a multi-threaded pipeline design, including but not limited to running each individual thread as fast as possible, and optimizing units of work per watt. Pipelining is one specific form of parallelism, where the execution of several instructions can be interleaved on the same hardware. Regardless of the goal of the pipeline design, for each thread of a program, multiple branch instructions can be encountered. When a thread experiences a branch, a prediction is made which direction the branch will take, and then the thread is executed along the predicted path. Branch prediction is typically implemented with a 2-bit saturating counter, per branch prediction table entry, where the states are: 00 (strongly not taken), weakly not taken (01), weakly taken (10) and strongly taken (11). On a resolved taken branch the counter increments; however, upon reaching a state of “11”, the counter saturates at this value. On a resolved not taken branch the counter decrements; however, upon reaching a state of “00”, the counter saturates at this value. A branch is predicted as taken for states “10” and “11” and a branch is predicted as not taken for states “00” and “01”. This saturating counter provides a direction for the branch; however, states nothing about the accuracy of the branch. For example on a for loop, the branch is taken every time except the fall-through case. On the fall-through iteration the confidence counter is degraded from “11” to “10”. The next time the for loop is encountered the branch will be taken again. As such, in a state of “11” the branch is taken for all but the fall-through encountering. On a state of “10”, the branch will be taken again. As such, a strong state does not indicate a more confident prediction than that of a weak state. What is not taken into account in such a scheme in making the prediction is the accuracy of the branch prediction. In a multi-threaded processing environment, multiple threads can simultaneously perform work. As such, multiple threads may be running in a pipeline, each thread encountering multiple branches, and propagating predicted state (strong/weakly taken/not taken) through the pipeline such that upon branch resolution the prediction state within the branch table can be updated. Additionally in a multi-threaded processor, there are resources which are shared among the threads. These dynamically divided resources are on a first come first serve basis. While dynamic resources allow for silicon area and power optimization, there are many performance limitations of a dynamic resource such as thread hogging which creates a performance limiter. For example, when one thread, X, is stalled, the other thread, Y, has the ability to utilize up a significant amount of the pipeline resources. Thread Y may be creating throw-away work (as for example down a to be determined branch wrong path) and taking resources from X which X could use in the near future for work which is not throw-away.