Cache circuits are important components which are frequently used in contemporary microprocessors and the like to increase system performance by reducing the potential amount of time needed to access information. Typically, a cache circuit includes various components, such as a tag memory which is commonly a random access memory ("RAM"). The tag RAM stores so-called tag information which corresponds to the cached data which is commonly stored in a separate cache data RAM. The tag information may include various characteristics corresponding to the cached data, such as the actual address where the cached data may be found in some other memory device (e.g., an external memory structure). Another component of a cache circuit is the hit detection circuit associated with the tag RAM. The hit detection circuit (of which there are N such circuits in an N-way set associative cache circuit) compares an incoming address with the actual address stored as part of the tag information. If the comparison matches, there is said to be a "hit" in the cache circuit, that is, the data sought at the incoming address may be retrieved directly from the cache data RAM rather than having to go to the original (i.e., often external) memory to retrieve that data; on the other hand, if the comparison does not match, there is said to be a "miss" in the cache circuit, that is, the data sought at the incoming address is not located, or for some other reason is not reliable, within the cache data RAM.
The hit detection circuit is typically part of the speed limiting path of the cache circuit as a whole. Therefore, various designs have arisen to reduce the time required for comparison by the hit detection circuit, as well as the selection in response to a hit determined by that circuit. For example, FIG. 1 illustrates a prior art configuration including a hit detection circuit designated generally at 10. Circuit 10 includes a gated clock signal connected to the gate of a p-channel transistor 14 which has a source connected to a system voltage level (e.g., V.sub.DD) and a drain connected to a match node 16. Match node 16 is connected to an integer number N+1 of single bit comparison circuits designated generally at 18, with two of those circuits 18.sub.0 and 18.sub.N shown for purposes of illustration. Each circuit 18 is constructed in a like manner and, therefore, circuit 18.sub.N is described in detail with it understood that like reference numbers are used in each similar such circuit with only a change in subscript to distinguish the different bit comparison circuits. Thus, turning to bit comparison circuit 18.sub.N, it includes an n-channel transistor 20.sub.N with its drain connected to match node 16 and its source connected to ground. The gate of transistor 20.sub.N is connected to the output of an exclusive OR ("XOR") gate 22.sub.N. A first input of XOR gate 22.sub.N receives an incoming address bit ADDRESS.sub.N, while a second input of XOR gate 22.sub.N receives a corresponding address bit from the tag information and designated TAG.sub.N. Match node 16 is also connected to the input of an inverter 24, and the output of inverter 24 is connected to a select circuit 26. Although not shown, note that actually the output of each XOR gate 22 is logically ANDed with the gated clock signal and the output of this logic AND combination is connected to a corresponding n-channel transistor 20. As a result, when the gated clock signal is low (i.e., when precharge is occurring), the output of this ANDed signal causes a low signal to be applied to each respective n-channel transistor 20 when precharging is occurring.
The operation of the components of FIG. 1 are well known to a person skilled in the art and, therefore, are only briefly addressed here. Generally the combination of all the XOR gates 22 from all of the bit comparison circuits 18 operate together to compare an entire incoming address with an address from the tag RAM (not shown). This determination indicates whether or not there is a hit in the cache circuit and, if a hit occurs, select circuit 26 selects the data, typically stored in a separate data RAM, which corresponds to the incoming address. Looking now more specifically to the circuit of FIG. 1, first the gated clock signal goes low during a precharge phase of operation. Consequently, the ANDing function described above with respect to the gated clock signal and the output of each XOR gate 22 (although not shown) forces the connection to the gate of each transistor 20 to be low during precharge. In addition, transistor 14 conducts and match node 16 is precharged to V.sub.DD. Thus, this signal is inverted by inverter 24 and the output signal, MATCH, is low during precharge. Thereafter, the gated clock signal goes high and each bit comparison circuit 18.sub.0 through 18.sub.N compares its two input bits. For example, looking to circuit 18.sub.N, XOR gate 22.sub.N determines whether ADDRESS.sub.N and TAG.sub.N are the same. If not (i.e., if there is a cache miss), XOR gate 22.sub.N causes transistor 20.sub.N to conduct, thereby discharging the precharge from match node 16 to ground. Consequently, the output of inverter 24 rises from low to high. Note further that if any one of the N+1 bit comparison circuits indicates a mismatch between its two inputs (i.e., again, if there is a cache miss), then it discharges match node 16 in a similar manner. Thus, only if each of those N+1 bit comparison circuits finds a match will match node 16 remain precharged. In other words, if the inputs for each of the XOR gates 22.sub.N through 22.sub.0 do match, each corresponding transistor 20.sub.N through 20.sub.0 remains off. Consequently, the output of inverter 24 remains low.
While the above discussion therefore demonstrates that circuit 10 may validly evaluate matches and mismatches between an incoming address and a tag address, note the configuration gives rise to various drawbacks. For example, typically the capacitive load imposed by select circuit 26 (as well as potential other loads) is considerably high. As a result, inverter 24 must be constructed of a sufficient size to drive that load. In turn, because inverter 24 is sized in this manner, each transistor 20 of a corresponding bit comparison circuit 18 also must be sized large enough to drive the large inverter 24. Otherwise, the speed of the circuit is reduced which may be an unacceptable performance penalty. On the other hand, there may be twenty or more bit comparison circuits and, therefore, it may be impractical to scale these components beyond a certain size. In addition to these drawbacks, note further the two output waveforms produced by the circuit of FIG. 1 when there is a miss or a hit. Specifically, when there is a hit, the output is originally precharged low, and then remains low in response to the hit. In contrast, when there is a miss, the output is again originally precharged low, but then transitions in response to the hit. The inventor of the present embodiments has recognized that given these waveforms, domino logic select circuits cannot be used as detailed below, and thus the types of circuitry within select circuit 26 may be limited to slower circuits, again thereby reducing the overall speed of the cache circuit.
In view of the above, there arises a need to address the drawbacks of prior art cache circuits and to provide higher speed self-timed cache circuits, systems, and methods for use with microprocessors and the like.