A processor may execute a hardware thread using local memory and execution units such as floating point units, integer units, vector units, and branch units. A single hardware thread executing on the processor cannot normally keep all execution units busy, as bottlenecks occasionally develop. Thus, in a process referred to as concurrent multithreading (also referred to herein as “concurrent multithreading”), some processors utilize two or more hardware threads, to provide the ability to execute operations in parallel. By scheduling software threads to run on the hardware threads in a manner that minimizes competition between the hardware threads for the local memory and execution units of the processor, a developer may create code that executes efficiently. The process of creating such code that executes efficiently for a given processor architecture may be referred to as processor optimization of the code.
A problem may occur when code that has been processor optimized for a target processor architecture is run in a virtual machine environment on a different processor architecture. The number of processors and processor cores, the amount of local memory, and the type, number, and configuration of execution units may vary in different processor architectures. Therefore, code that has been processor-optimized for a target processor architecture may fail to achieve intended efficiency gains, and instead may end up running slower in a virtual machine environment on a different processor architecture. Such a decrease in performance may result in a degraded user experience, particularly for processor intensive tasks, such as graphics processing.