A multi-threaded processor may fetch the instructions belonging to a thread and execute them. While executing instructions for a thread, the processor may execute an instruction that generates a reference to a memory location. Because of the delay associated with the access to the referenced memory location, the processor may have to wait until the referenced memory location is accessed. Similarly, if an instruction takes multiple cycles to execute, a subsequent instruction that depends on it, will have to wait. In order to maintain efficiency, the processor may fetch instructions from a different thread and start executing them. This way, the processor may execute instructions more efficiently. This type of parallelism may be referred to as thread level parallelism. Another way to improve performance is to obtain instruction level parallelism.
Instruction level parallelism may include determining the dependences of the instructions in a thread and issuing the instructions that are independent. The processor may speculatively try to predict dependences and execute the instructions in the thread based on that. Such predictions may turn out to be inaccurate resulting in the processor having to discard the results of the incorrectly predicted dependences of the instructions and re-execute the instructions in the correct order.