The present invention is directed to a method and system for bypassing a fill buffer provided on a microprocessor instruction pipeline.
Modem microprocessors include instruction pipelines in order to increase program execution speeds. Instruction pipelines typically include a number of units, each unit operating in cooperation with other units in the pipeline. One exemplary pipeline, found in, for example, Intel""s Pentium(copyright) Pro microprocessor, includes an instruction fetch unit (IFU), an instruction decode unit (ID), an allocation unit (ALLOC), an instruction execution unit (EX) and a write back unit (WB). The instruction fetch unit fetches program instructions, the instruction decode unit translates the instructions into micro-ops (referred to hereinafter as uops), the allocation unit assigns a sequence number to each uop, the execution unit executes the uops, and the write back unit retires the executed uop. Also included in instruction pipelines is a trace cache unit, which acts as a static, high speed RAM that collects uops from the instruction decode unit and provides these uops for execution much more quickly than if such instructions were provided for execution from a dynamic memory. Since trace cache unit exhibits a relatively high hit rate, trace cache unit speeds up the flow of instructions to the execution unit of the instruction pipeline.
Certain instruction pipelines, such as the one for Intel""s Pentium(copyright) Pro microprocessor, also include a fill buffer (FB) that is located between the instruction decode unit (or some other uop source) and trace cache unit. The reason for placing a buffer between the uop instruction source and the trace cache memory is that the instruction source typically operates according to a clock rate that is higher than the clock rate at which the trace cache operates. Since uops are provided from the instruction source faster than the trace cache can handle them, the fill buffer temporarily stores these uops and provides them to the trace cache unit at a rate that is compatible with the operating rate of the trace cache. Thus, a uop supplied from an instruction source is written into the fill buffer at a clock pulse corresponding to the first clock rate and is read out from the buffer at a clock pulse of the second, slower clock rate. A disadvantage with this temporary storage scheme is that the latency of uops along the instruction pipeline is increased due to the time spent storing the uops in the fill buffer. As a result, the throughput rate of the instruction pipeline is reduced, which slows down the overall instruction execution rate of the microprocessor.
According to an embodiment of the present invention, the latency of uops being provided from an instruction source to a memory located downstream in an instruction pipeline is reduced.
According to the embodiment of the present invention, an instruction is written into a buffer located along a first instruction path of an instruction pipeline if a first condition is met, and the instruction is transmitted along a second instruction path of the instruction pipeline if a second condition is met. The latency of uops transmitted along the second instruction path is less than the latency of uops transmitted along the first instruction path.