Before the invention of caches, several machines implemented forms of dynamic scheduling in order to avoid stalling when a cache miss was encountered. The two most notable examples were the CDC 6600 with its scoreboard and the IBM 360/91 with its Tomasulo Algorithm, which were introduced in the late 1960's. Dynamic scheduling, which entails rearranging the order of instructions in hardware during execution of the program while maintaining the semantics of the original program order, was found to be extremely complex, expensive, hard to debug, and hard to test. Therefore, during the 1970's and 1980's, no other dynamically scheduled machines were produced at IBM. Similarly, dynamic scheduling was also abandoned at CDC. Furthermore, dynamically scheduled processors were not produced by other manufacturers during that period.
Shortly after the introduction of the CDC 6600 and the IBM 360/91, computer systems using cache memory were developed. In those systems, as in modem computers, most memory accesses by a processor are satisfied by data in cache memory. Since the cache can be accessed much more quickly than main memory, the need for dynamic scheduling was also reduced.
In recent years, processor cycle times have decreased greatly, and the capacity of memory chips has increased significantly. But the access time of memory chips has changed little. This has led to an increasing gap between cache access times and main memory access times.
For example, in the late 1970's, a VAX 11-780 would only slow down 50% if its cache was turned off and if it executed out of main memory. Today, main memory access times can be more than 100 cycles, and programs could slow down by more than 100 times if they fetched each instruction and data reference from main memory instead of cache. Even when an instruction or data reference is occasionally accessed from main memory, the small amount of cache misses can still greatly slow down program execution because of the long memory access times.
In order to reduce processor stalling when a cache miss is encountered, some microprocessor manufacturers have reintroduced dynamically scheduling in their processors in recent years. A dynamically scheduled processor will try to find other instructions that do not depend on the data being fetched from the missing load, and execute these other instructions out-of-order and in parallel with the cache miss. Significantly higher performance can thus be obtained.
Dynamically scheduled microarchitectures, analogous to the dynamically scheduled systems, are complex, have a large transistor count, a long design time, and long verification cycles. Therefore, there exists a need for a microarchitecture that reduces processor stalling when a cache miss is encountered, and that does not resort to a high complexity design.