A conventional pipelined processor has multiple processor stages coupled in a pipeline. Data progresses from one processor stage to the next on each pipeline cycle. For example, a three-stage pipeline processor has a first fetch stage for fetching data or instruction information from a peripheral circuit outside the processor, a second execution stage for executing an instruction using the fetched information, and a third write (or “write-back”) stage for writing processed data to a peripheral circuit outside the processor. An example of a three-stage pipelined processor is the ARM7TDMI processor from ARM Ltd. of Cambridge, Great Britain.
In certain applications the pipeline speed is limited not by the processor, but by the arrangement of peripheral circuits and a bus interface between the peripheral circuits and the processor. The term “bus interface” is commonly used to mean any of the bus or circuit elements between the processor and the peripheral circuits. The pipeline cannot be clocked at a rate higher than that supportable by the arrangement of the peripheral circuits and the bus interface. In particular, the peripheral circuits and the interfacing bus should handle accesses within the timing of the pipeline cycles. One situation in which the timing can be problematic is when the peripheral circuit is on a different peripheral bus from a native bus coupled directly to the processor. Another situation is when the peripheral circuit has a slow access response time, as do many “slow” memories. In the case of both a slow access response time and a different peripheral bus, the problem of timing can severely limit the maximum pipeline speed supportable.
A cache is included on the native bus to attempt to reduce data transfer overhead to the peripheral circuits and/or to other buses. However, a cache significantly complicates the design and timing problems for the bus interface. For example, conventional cache designs suspend an access to a peripheral circuit while the cache determines whether the access is cached. In the event of a cache-miss (i.e., an addressed location is not cached), the cache must execute a late bus access to the peripheral circuit. Suspending the bus access necessarily increases the overall time required to perform the access, and further reduces the maximum pipeline speed that can be supported. Alternatively, the cache can initiate the bus access to the peripheral circuit without suspending while simultaneously determining whether the access is cached. Such a configuration is referred to as a zero wait-state cache. In the event of a cache-hit (i.e., an addressed location is cached) the cache must be able to abort the initiated bus access to the peripheral circuit. However, not all buses can accommodate access aborts, limiting the applications for zero wait-state caches. For example, the protocol for the AMBA standard High-performance Bus (AHB) from ARM Ltd. does not support access aborts. Handling access aborts is also highly problematic if the peripheral circuit is on a different bus from the native bus to which the cache is coupled.
Many processing applications demand that the processor core be operated at as high a speed as possible. However, in applications in which the above timing constraints limit the maximum pipeline speed that the peripheral circuits and the bus interface can support, the circuits do not realise the full performance potential of the processor.