The present application relates generally to an improved data processing apparatus and method and more specifically to an apparatus and method for partial flush handling with multiple branches per group.
Present-day high-speed processors include the capability of simultaneous execution of instructions, speculative execution and loading of instructions and simultaneous operation of various resources within a processor. In particular, it has been found desirable to manage execution of one or more threads within a processor, so that more than one execution thread may use the processor resources more effectively than they are typically used by a single thread.
Prior processor designs have dealt with the problem of managing multiple threads via a hardware state switch from execution of one thread to execution of another thread. Such processors are known as hardware multi-threaded (HMT) processors, and as such, can provide a hardware switch between execution of one or the other thread. An HMT processor overcomes the limitations of waiting on an idle thread by permitting the hardware to switch execution to a non-idle thread. Execution of both threads can be performed not simultaneously, but by allocating execution slices to each thread when neither is idle. However, the execution management and resource switching (e.g., register swap out) in an HMT processor introduce overhead that makes the processor less efficient than running on two single-threaded processors. In addition HMT does not allow threads to take full advantage of instruction parallelism by using multiple execution engines that are usually not all busy at the same time, since only one thread is executing at a given time.
Simultaneous multi-threaded (SMT) processors provide an even more efficient use of processor resources, as multiple threads may simultaneously use processor resources. Multiple threads execute concurrently in an SMT processor so that multiple processor execution units, such as floating point units, fixed point instruction units, load/store units, and others can perform tasks for one (or more, depending on the capabilities of the execution units) of multiple threads simultaneously. Storage and register resources may also be allocated on a per-thread basis so that the complete internal state switch of the HMT is avoided.