Modern processors employed in computer systems use various techniques to improve their performance. One of these techniques is commonly referred to as “multithreading.” Multithreading allows multiple streams of instructions, commonly referred to as “threads,” to be executed. The threads may be independent programs or related execution streams of a single parallel program or both.
Processors may support three types of multithreading. The first is commonly referred to as “coarse-grained” or “block multithreading.” Coarse-grained or block multithreading may refer to rapid switching of threads on long-latency operations. The second is commonly referred to as “fine-grained multithreading.” Fine-grained multithreading may refer to rapid switching of the threads on a cycle by cycle basis. The third type of multithreading is commonly referred to as “simultaneous multithreading.” Simultaneous multithreading may refer to scheduling of instructions from multiple threads within a single cycle.
In modern processors, including simultaneous multithreading (SMT) processors, a condition commonly referred to as a “livelock” may occur. A livelock in the context of an SMT processor may occur when a thread cannot make forward progress because of a resource being locked. For example, in an SMT processor, instructions for multiple threads may be fetched and decoded. The decoded instructions may be forwarded in an order determined by an algorithm (can be out-of-order) to queues coupled to execution units, e.g., floating point units, fixed point units, load/store units. Each queue may be coupled to a particular execution unit. The queue may issue instructions from multiple threads to an execution unit in a manner where the instruction that has been stored in the queue the longest (referring to the instruction stored at the bottom of the queue) may be first issued to the execution unit. Some of the instructions stored in the queue may be “macroinstructions” that are made up of simple, micro-operations, called “micro-ops.” These micro-ops may be stored in separate instructions and hence stored in separate entries in the queue. The execution unit upon executing a first micro-op of a particular macroinstruction for a particular thread, e.g., thread T0, may expect to receive the following micro-op for that particular macroinstruction to be executed. If the execution unit does not receive the expected instruction from the issue queue, the execution unit may transmit an indication to that queue that the instruction was rejected and to reissue that instruction at a later point in time. The queue may then store the rejected instruction in the entry in which it was previously located. The queue may subsequently issue the following stored instructions which may include instructions of another thread, e.g., thread T1, in a designated number of cycles and then reissue the rejected instruction(s). Since the queue is limited in the number of cycles during which it may issue subsequently stored instructions, the queue may not issue the particular instruction the execution unit is expecting. That is, the queue may start to reissue rejected instructions prior to issuing instructions located in entries towards the top of the queue. If the instruction the execution unit is expecting is located in one of these top entries, then the execution unit may never receive the expected instruction. Consequently, the execution unit may continually reject the received instructions which may include instructions of another thread, e.g., thread T1, and the queue may continually reload and later reissue these rejected instructions. Hence, the other thread, e.g., thread T1, may be unable to make forward progress. This condition may be referred to as “livelock.”
The current techniques for detecting livelock conditions usually involve a counter counting the number of cycles from the last instruction executed. If the number exceeds a threshold, then a livelock condition may be assumed. Typically, the threshold is extremely high, such as on the order of a million cycles, to ensure that a livelock condition is not incorrectly identified such as identifying the fetching of an instruction from memory after a cache miss as a livelock condition. Further, the current recovery methods for a livelock condition usually involve a flush of the stored instructions and to refetch the instruction causing the livelock condition. These techniques for detecting livelock conditions are too slow. Further, flushing of instructions should be avoided if at all possible.
Therefore, there is a need in the art to effectively handling livelocks in a simultaneous multithreading (SMT) processor by detecting livelock conditions earlier than current detection techniques and avoiding the flushing of instructions in a recovery action.