The integrated circuit microprocessors presently in use are capable of achieving speeds which are about one hundred times faster than the first generation microprocessors which originally became available about twenty-five years ago. Although this increase in speed can be attributed to some extent to fundamental advances in integrated circuit processes, physical limitation will always place certain restrictions on the maximum attainable processing speed. Therefore, it is now common for microprocessors to include such features as local cache memories, instruction pipelines, instruction reordering and other architectural techniques which can be used to achieve even faster speeds.
One primary limitation on the speed of microprocessor operation is cycle time. In general, minimum cycle times are encouraged by keeping the definition of individual instructions fairly simple, and by keeping the interactions between such instructions also very simple. However, perhaps the most critical aspect of minimizing cycle time is the need to optimize the implementation of data busses that are used to transfer information to and from the central instruction execution unit(s).
For example, a load instruction causes the processor to first look to the contents of a local cache memory (e.g. local in the sense that it is located on the same integrated circuit as the instruction execution unit) to attempt to deliver data to an internal register. If a miss occurs such that the data is not available in the local cache, the processor must then attempt to obtain the data by performing an off chip transaction referred to as a cache fill operation. The fill operation updates the contents of the internal cache to match those of an external main memory or backup cache.
One goal of high speed Reduced Instruction Set Computing (RISC) architectures is typically to keep a subset of external memory available in the internal memory Therefore, a typical cache miss operation may result in multiple data words being provided from the external memory to the cache such as over a data bus. As a result, a simple load operation may in fact occupy a data bus for an extended number of cycles, depending upon whether or not its hits in the cache.
In order to maximize the overall execution speed of the processor, it is therefore necessary to optimize the use of the bus such that there are no unused cycles, even during cache fill operations.
Another challenge exists in that the bus must therefore efficiently support both internal and external transactions.
For example, if the internal cache access time is approximately equal to the off chip memory access time, then no unused cycles or "dead time" will exist on the data bus during a fill operation. A somewhat optimized situation also exists if the two access times are integral multiples of one another. For example, if the internal cache can run at two times the speed of the external cache, an external transaction can be completed on every other cycle.
In some instances, however, the optimized speed of internal and external transactions may not be the same as or even integral multiples of one another. Unfortunately, without careful planning of the use of the common data bus, the use of caches may actually result in many unused dead cycles on the bus, which in turn adversely impacts the ability to achieve optimum performance on the bus. For example, if the transactions with the external cache may occur at one and one half the times slower (1.5.times.) than the internal transactions, the data bus may end up being idle approximately one third of the time. This is because there is a need to wait an additional bus cycle for every two external transactions.