The invention relates the field of computer science and specifically to the optimization of multithreaded applications for pipelined microprocessors. Microprocessors typically perform a number of different tasks to execute an instruction. Typically, the tasks for each microprocessor instruction must be performed in sequence. For example, a microprocessor must first read or fetch an instruction; interpret or decode the instruction; read or fetch the data needed to perform the instruction, if any; execute the instruction; and store the instruction results, if any.
Many microprocessor use an instruction pipeline to improve performance. In an instruction pipeline, each task for executing an instruction is performed by a different portion of the microprocessor hardware, referred to as a pipeline stage. The pipeline stages are connected in the sequence that the microprocessor performs tasks to execute instructions.
Pipeline stages are typically capable of operating relatively independently. As a result, earlier pipeline stages, which are the pipeline stages at the beginning of the pipeline, can start work on subsequent instructions while the later pipeline stages are still performing task for earlier instructions. Microprocessors including instruction pipelines with 20 or 30 are not uncommon. Specialized information processing devices, such digital signal processors, graphics processing units, and ASICs can include much longer instruction pipelines.
Ideally, every pipeline stage is constantly active and processing instructions, rather than idle. If a pipeline stage must wait for an instruction or data, the pipeline is said to have stalled. Frequent pipeline stalls decrease the performance of pipelined microprocessors.
Threads of execution, or threads, are a common technique for splitting programs into two or more simultaneous running tasks. Multiple threads can often be executed in parallel, either by multiple microprocessors operating in parallel; a single microprocessor with multiple execution cores or specialized multithreaded execution capabilities; or by time-multiplexing different threads, where a processor frequently switches execution between different threads.
Compilers, operating systems, and virtual machines can include additional instructions within the object code of a program to implement multiple threads. These additional instructions can handle features such as starting and stopping threads, switching between threads, preserving thread state information, thread scheduling and priority, and inter-thread communication. These features can be implemented using specialized features of the microprocessor or with general microprocessor features, such as timers, interrupts, and stack operations, and programming conventions.
Typically, threads are executed on a time sharing basis. Thread switching is performed at predetermined time intervals based on thread priority and/or load balancing concerns.
It is desirable for a system and method to provide improved thread switching capabilities while minimizing the frequency and impact of pipeline stalls.