1. Technical Field
The present invention relates generally to information processing systems and, more specifically, to dynamically adapting a binary file to facilitate speculative precomputation.
2. Background Art
In order to increase performance of information processing systems, such as those that include microprocessors, both hardware and software techniques have been employed. On the hardware side, microprocessor design approaches to improve microprocessor performance have included increased clock speeds, pipelining, branch prediction, super-scalar execution, out-of-order execution, and caches. Many such approaches have led to increased transistor count, and have even, in some instances, resulted in transistor count increasing at a rate greater than the rate of improved performance.
Rather than seek to increase performance through additional transistors, other performance enhancements involve software techniques. One software approach that has been employed to improve processor performance is known as “threading.” In software threading, an instruction stream is split into multiple instruction streams that can be executed in parallel. In one approach, multiple processors in a multi-processor system may each act on one of the multiple threads simultaneously.
In another approach, known as time-slice multi-threading, a single processor switches between threads after a fixed period of time. In still another approach, a single processor switches between threads upon occurrence of a trigger event, such as a long latency cache miss. The latter approach is known as switch-on-event multithreading. While achieving performance gains in certain circumstances, these approaches do not achieve optimal overlap of many sources of inefficient resource usage, such as branch mispredictions and instruction dependencies.
In a quest for further performance improvements, the concept of multi-threading has been enhanced in a software technique called simultaneous multi-threading (“SMT”). In SMT, multiple threads can execute simultaneously on a single processor without switching. In this approach, a single physical processor is made to appear as multiple logical processors to operating systems and user programs. That is, each logical processor maintains a complete set of the architecture state, but nearly all other resources of the physical processor, such as caches, execution units, branch predictors, control logic and buses are shared. The threads execute simultaneously and make better use of shared resources than time-slice multithreading or switch-on-event multithreading. Nonetheless, there is still a performance penalty to be paid on a cache miss, or other long latency operation, that occurs during execution of the threads.
Embodiments of the method and apparatus disclosed herein address this and other concerns related to latencies and multi-threading.