The present invention is directed, in general, to data processing systems and, more specifically, to a circuit and method for determining if an address is within the address range of a stack cache.
The demand for high performance computers requires that state-of-the-art microprocessors execute instructions in the minimum amount of time. A number of different approaches have been taken to decrease instruction execution time, thereby increasing processor throughput. One way to increase processor throughput is to use a pipeline architecture in which the processor is divided into separate processing stages that form the pipeline. Instructions are broken down into elemental steps that are executed in different stages in an assembly line fashion.
A pipelined processor is capable of executing several different machine instructions concurrently. This is accomplished by breaking down the processing steps for each instruction into several discrete processing phases, each of which is executed by a separate pipeline stage. Hence, each instruction must pass sequentially through each pipeline stage in order to complete its execution. In general, a given instruction is processed by only one pipeline stage at a time, with one clock cycle being required for each stage. Since instructions use the pipeline stages in the same order and typically only stay in each stage for a single clock cycle, an N stage pipeline is capable of simultaneously processing N instructions. When filled with instructions, a processor with N pipeline stages completes one instruction each clock cycle.
The execution rate of a pipeline processor is theoretically N times faster than an equivalent non-pipelined processor. A non-pipelined processor is a processor that completes execution of one instruction before proceeding to the next instruction. Typically, pipeline overheads and other factors decrease somewhat the execution advantage rate that a pipelined processor has over a non-pipelined processor.
A multi-stage processor pipeline may consist of an instruction fetch stage, a decode stage, an operand fetch stage, and an execute stage, among others. In addition, the processor may have an instruction cache that stores program instructions for execution, a data cache that temporarily stores data operands that otherwise are stored in processor memory, and a register file that also temporarily stores data operands.
In a stack-based microprocessor system, the variables, arguments and processor status may be stored in a portion of memory called the stack frame. A stack frame base pointer contains the address of the root location of the current stack frame, while a stack frame index pointer contains the address of the exact memory location of the byte or word, as shown in FIG. 5. FIG. 5 illustrates an exemplary stack frame in a portion of the memory stack. The processor accesses the stack frame frequently, using the index pointer or sometimes using an absolute pointer. In the case of an index pointer, the absolute address can be calculated by adding the base pointer and the index pointer.
In high performance computer systems, it is desirable to maintain portion of the current stack frame in a very small but fast register cache, in order to speed up the access to the otherwise slower large L1 or L2 cache, or the even slower off-chip main memory. For example, a program can setup a stack frame base pointer register and use an index pointer to access a memory location. If a data processor implements a stack cache and ensures that the base of the cache corresponds to the base pointer, the memory value can be addressed simply by indexing, as if it is read from a register file. Normally a valid status bit is used to indicate the validity of a given entry in the cache.
However, in such designs, maintaining the data coherence between the stack cache and the slower memory that it mirrors has proved to be a critical issue. Take the following program sequence as an example: 1) store a value to absolute location A; 2) read a memory location using base pointer B and index C. A high performance machine would normally fetch the memory location indexed by C from the stack cache before executing the store operation. This is so because the store operation has to wait for the data operand from an execution stage, which is usually near the end of the pipeline. The processor can fetch the value indexed by index C from stack cache quickly. However, it has to also make sure that the store operation does not make the data stale (i.e., if A=B+C). A typical approach employed by many designs is to compare the absolute address A with the range of addresses in cache and invalidate the entry by resetting the valid status bit if there is a match, thereby forcing a cache miss and maintaining the integrity of the cache data.
Invalidation of the stack cache valid bits has proved to be a critical timing issue in a high-speed design. In a typical implementation, the data processor subtracts the base address B from the absolute store address A and invalidates an entry using the lower bits of the difference if the high order bits of the result are zero. For example, for a 16-entry cache, the twenty-eight (28) most significant bits, D[31:4], of the difference, assuming a 32-bit address space, must be zero if address A is cached in the stack cache. Unfortunately this approach is slow because it relies on the result of a relatively large and slow adder that is normally used to do the subtraction. In addition to its speed problems, the adder also requires a large amount of chip space.
Therefore, there is a need in the art for an improved data processor that more rapidly invalidates an entry in the stack cache. In particular, there is a need in the art for improved address rang check circuitry for invalidating an entry in the stack cache.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide an address range checking circuit capable of determining if a target address, A[M:0], is within an address space having 2N address locations beginning at a base address location, B[M:0], wherein the address range checking circuit does not require a large comparator circuit.
According to an advantageous embodiment of the present invention, the address range checking circuit comprises: 1) comparison circuitry capable of determining if the address segment A[Nxe2x88x921:0] is less than the address segment B[Nxe2x88x921:0] and generating on an output a first control signal having a first logic state indicating A[Nxe2x88x921:0] is less than B[Nxe2x88x921:0] and having a second logic state indicating A[Nxe2x88x921:0] is not less than B[Nxe2x88x921:0]; 2) first equivalence detection circuitry capable of determining if the address segment A[M:N] is equal to the address segment B[M:N] and generating on an output an A=B status signal having a first logic state indicating A[M:N] is equal to B[M:N] and having a second logic state indicating A[M:N] is not equal to B[M:N]; 3) second equivalence detection circuitry capable of determining if the address segment A[M:N] is equal to the address segment B[M:N] plus one and generating on an output an A=B+1 status signal having a first logic state indicating A[M:N] is equal to B[M:N] plus one and having a second logic state indicating A[M:N] is not equal to B[M:N] plus one; and 4) a multiplexer controlled by the first control signal generated by the comparison circuitry, the multiplexer having a first input coupled to the first equivalence detection circuitry output and a second input coupled to the second equivalence detection circuitry output, wherein the first control signal causes the multiplexer to output the A=B status signal when the first control signal is at the second logic state indicating A[Nxe2x88x921:0] is not less than B[Nxe2x88x921:0] and causes the multiplexer to output the A=B+1 status signal when the first control signal is at the first logic state indicating A[Nxe2x88x921:0] is less than B[Nxe2x88x921:0].
According to the principles of the present invention, the comparison circuit in the address range checking circuit compares only a small number, N, of the least significant bits of the addresses A[M:0] and B[M:0], so that N is much smaller than M. This allows the comparison circuit to be very fast compared to a comparator circuit that compared all of the address bits in the addresses A[M:0] and B[M:0].
According to one embodiment of the present invention, the second equivalence detection circuitry comprises Mxe2x88x92N+1 inverters, each of the Mxe2x88x92N+1 inverters receiving and inverting one of the address bits in the address segment B[M:N].
According to another embodiment of the present invention, the second equivalence detection circuitry further comprises Mxe2x88x92N+1 carry-save adders, each of the Mxe2x88x92N+1 carry-save adders having a first input for receiving one of the address bits in the address segment A[M:N], A, a second input for receiving a corresponding one of the inverted B[M:N] address bits, Bxe2x80x2, and a carry-in (CI) input for receiving a carry-in value equal to 1, and wherein each of the Mxe2x88x92N+1 carry-save adders generates a sum (S) output and a carry-out (CO) output, such that each of the Mxe2x88x92N+1 carry-save adders has the following truth table:
According to still another embodiment of the present invention, the second equivalence detection circuitry further comprises Mxe2x88x92N+1 exclusive-OR gates, wherein each of the Mxe2x88x92N most significant exclusive-OR gates has a first input coupled to the sum output of the Kth one of the Mxe2x88x92N most significant carry-save adders and has a second input coupled to the carry-out output of the (Kxe2x88x921)th one of the Mxe2x88x92N most significant carry-save adders and wherein the least significant exclusive-OR gate has a first input coupled to the sum output of the least significant carry-save adder and a second input coupled to a Logic 0.
According to yet another embodiment of the present invention, the second equivalence detection circuitry further comprises an AND gate having Mxe2x88x92N+1 inputs, each of the Mxe2x88x92N+1 AND gate inputs coupled to one of the Mxe2x88x92N+1 outputs of the Mxe2x88x92N+1 exclusive-OR gates, wherein an output of the AND gate comprises the A=B+1 status signal.
As noted above, the address range checking circuit is fast because it does not require a large comparator circuit and relies on equivalence detection circuits. Accordingly, in an address range checking circuit according to the principles of the present invention, the value of M is much larger than N.
In one embodiment of the present invention, M is at least 15 and N is less than 8.
In another embodiment of the present invention, M is at least 31 and N is less than 8.
In still another embodiment of the present invention, M is at least 31 and N is less than 6.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION OF THE INVENTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9ccontrollerxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.