Modern microprocessors operate internally similar to an assembly line in an automobile factory. An assembly line includes various stages, each performing a different function needed to assemble a car. Similarly, microprocessors include several stages connected together to form what is commonly referred to as a pipeline. Each stage in the pipeline performs a different function needed to execute a software program instruction.
In the assembly line, multiple cars follow one another down the line and move through the line simultaneously, with each car being at a different stage of assembly. This aspect of the assembly line enables it to produce more cars per day than a factory that doesn't start assembling another car until the current car is fully assembled. Similarly, multiple instructions follow one another down the microprocessor pipeline simultaneously, with each instruction being executed in part by a different stage of the pipeline. Pipelined microprocessors are capable of executing more instructions per second than non-pipelined processors.
Two predominant instructions executed by microprocessors are load and store instructions. A load instruction loads data from memory into the microprocessor. A store instruction stores data from the microprocessor to memory. Load and store instructions may exist at different stages of the pipeline simultaneously as described above, and it is desirable for them to do so because it is beneficial to performance.
In addition, transfers of data from or to memory required by load and store instructions typically take longer than the time required to perform non-memory transfer instructions, such as an add instruction. This could be detrimental to performance if other instructions in the pipeline behind a load or store that could otherwise complete were required to wait in the pipeline until the load/store memory transfer completed. To avoid this problem, microprocessors employ data buffers, or data latches.
Some data buffers, commonly referred to as write buffers, are used to hold data until it can be written to memory on the microprocessor bus that connects the microprocessor to memory. Other data buffers, commonly referred to as store buffers, are used to hold data until it can be written to cache memory. Other data buffers, commonly referred to as fill buffers, or response buffers, are allocated for receiving data from memory on the processor bus to be provided to functional units within the microprocessor. Still other data buffers, commonly referred to as replay buffers, are used to temporarily hold data as it flows through various stages of the pipeline until it reaches a write buffer or store buffer, or to temporarily hold load data as it flows through various stages of the pipeline after having been delivered to a pipeline functional unit from a fill buffer.
Although it is desirable to buffer load/store data and allow multiple loads and/or stores to be pending in the pipeline simultaneously, the microprocessor must ensure data coherency and proper ordering of data transfers on the microprocessor bus. For example, if a load instruction to an address in memory follows a store instruction to the same address, the microprocessor must ensure that the load instruction receives the data of the store instruction rather than the data currently in memory at the address. That is, the contents of memory at the address is not the newest data because the store instruction has newer data associated with the memory address, but the new data has not yet been written to memory. Hence, the microprocessor must either wait for the new data to be written to memory and then retrieve it from memory for the load instruction, or the microprocessor must internally supply the new data from the store instruction to the load instruction.
Regardless of which way the microprocessor chooses to provide the new data to the load instruction, one thing is clear: at some point after the load instruction enters the pipeline, the microprocessor must compare the load address with all store addresses pending in buffers in the pipeline ahead of the load in order to determine whether the load address matches any of the store addresses. Other situations besides the example of the load following a store described above require address comparison in order to ensure data coherency.
In a modern microprocessor, it is not uncommon to have several tens of data buffers for handling load and store instructions simultaneously to improve performance. Each data buffer also includes an associated address latch, or buffer, for storing the associated load address or store address. As the number of data buffers and associated address latches increases, so must the number of address comparators increase to determine whether an address match has occurred in order to insure data coherency. The size of the addresses is typically on the order of 32 bits or more. Consequently, the amount of area consumed on the microprocessor integrated circuit by the address latches and comparators may be significant. Additionally, the complexity of the control logic needed for ensuring data coherency based on the address comparator results increases exponentially as the number of comparators increases.
Therefore, what is needed is a solution to the problem created by the large number of address latches and address comparators used to ensure data coherency in microprocessors with large numbers of data buffers.