While conventional CMOS (Complementary Metal Oxide Semiconductor) integrated circuit processes may be used to create circuits which consume less power and occupy less space on a semiconductor substrate than similar circuits designed around a bipolar transistor fabrication process, bipolar devices, among other advantages, have the inherent ability to operate at higher speeds, drive larger capacitive loads, and switch faster than their MOS (Metal Oxide Semiconductor) counterparts. In an attempt to capture the advantages of both bipolar and MOS devices in one circuit, a BiCMOS (Bipolar Complementary Metal Oxide Semiconductor) process has been developed.
In a BiCMOS process, bipolar and MOS transistors are both created on a single semiconductor substrate so that a portion of the resulting circuit operates using bipolar transistors while another portion of the same circuit operates using MOS transistors. As a result, one is given the freedom to design and implement a BiCMOS circuit which, among other advantages, consumes very little power, occupies very little space, operates at very high speeds, and can drive large capacitive loads. A circuit such as this would have extensive applications in, for instance, battery-powered notebook computers where power consumption must be minimized in order to prolong battery life, size must be minimized to enhance portability, and processing speeds must be fast enough to handle advanced computational applications.
For demonstration purposes below, we will assume that a low voltage is that voltage which corresponds most closely with one particular logical state while a high voltage is that voltage which corresponds most closely with the opposite logical state in a binary scheme. For example, in a 5 volt CMOS system, a voltage greater than approximately 2.5 V may be considered a logical "1", and a voltage less than approximately 2.5 V may be considered a logical "0". Of course, this correspondence may be reversed such that a low voltage represents a logical "1" and a high voltage represents a logical "0". In an alternate system which operates with a 3 V supply, for example, a voltage greater than approximately 1.5 V may be considered a logical "1", and a voltage less than approximately 1.5 V may be considered a logical "0". Of course, this correspondence may again be reversed. In general, the lower supply voltage (which is simply ground in many applications) plus one-half the difference between the upper supply voltage minus the lower supply voltage of any system may be considered the approximate boundary between high and low voltages or alternate logical states for demonstration purposes herein.
A comparator is a circuit which compares two pieces of data and determines if they are equal. A comparator generally accepts two bit strings at its input and compares each bit position in the first bit string to the equivalent bit position in the second bit string. If the bits contained at each bit position in the first bit string are equivalent to each corresponding bit position in the second bit string, then the comparator detects a "hit" and produces the necessary output which communicates that hit to the rest of the device. If any bit in the first bit string is not identical to the corresponding bit in the second bit string, then the comparator detects a "miss" and produces the necessary output which communicates that miss to the rest of the device.
FIG. 1 illustrates a typical application of a cross-unit comparator in a system. A signal is generated at the input 20 which triggers two relatively independent circuits (or units), 21 and 22, to process information. The comparator circuit 25 is then used to determine whether the data produced by circuit 21 is the same as the data produced by circuit 22. While the two circuits 21 and 22 may be simultaneously triggered, all of the output data 27 of circuit 21 will not likely reach the comparator circuit 25 at the same time that all of the output data 28 of circuit 22 reaches the comparator circuit 25. The output 27 of circuit 21 may reach the comparator circuit 25 well before, well after, or partially before and partially after the output 28 of circuit 22 reaches the comparator. There are many reasons for this.
One mason the output 27 of circuit 21 and the output 28 of circuit 22 may not reach the comparator circuit simultaneously is simply due to the inherent differences in the speed of the two circuits, 21 and 22, as a result of their differing design and operation. Another reason may be that circuit 21 is controlled by a different clock having different timings than the clock controlling circuit 22. Or, perhaps, the length of the bus line carrying output data 27 from circuit 21 to the comparator circuit 25 is much different than the length of the bus line carrying output data 28 from circuit 22 to the comparator circuit 25. In such a case, RC delay may significantly contribute to differences in arrival time of the data to the comparator circuit 25.
Typically, dynamic circuits are capable of operating at higher speeds than their static circuit counterparts. However, if the comparator circuit 25 is to be implemented as a dynamic circuit, enough delay margin must be added to the dynamic circuit so that it is not triggered until the comparator circuit 25 has fully received both inputs 27 and 28. If the proper delay margin is not added, the dynamic circuit may be prematurely tripped which may cause erroneous output at node 26 from which the dynamic circuit cannot recover. Therefore, it is necessary to determine the worst case, slowest arrival time in receiving both signals 27 and 28 at the input of comparator circuit 25 and design the dynamic circuit to allow triggering only after such a delay margin has been added. Unfortunately, it is not always possible to determine or detect the arrival time of the signals 27 and 28 accurately.
For instance, it may become necessary to design the comparator circuit 25 before or concurrently with designing the circuits 21 or 22. In such a case it would be impossible to complete a comparator circuit 25 until both circuit blocks 21 and 22 have been completed and simulated in order to determine what the delay margin should be for the comparator circuit 25. Also, even if the circuits 21 and 22 are simulated, there is no guarantee that they will physically perform in the same manner in which the simulation predicts. In addition, depending on the layout pattern ultimately used in the circuit, RC delay may not be properly accounted for in the delay of comparator circuit 25.
Furthermore, if comparator circuit 25 is initially designed as a dynamic circuit which accounts for the arrival time of input data 28 which is initially known to arrive after input data 27, it is possible that over the lifetime of the device, circuit 21 may be stressed more heavily than circuit 22 causing circuit 21 to become slower than circuit 22 due to stress effects such as electromigration and hot electron injection. Alternately, circuit 22 may become much slower than was originally accounted for due to stress of circuit 22 over the lifetime of the device. The result is that delay margins that were originally added to the comparator circuit 25 will become inadequate over time leading to long term device failure. Also, delay margin must be added to a dynamic circuit in order to account for noise on the input lines which occurs soon after the input lines are switched. Noise can cause a dynamic circuit to prematurely trigger. Therefore a delay margin is added to be sure the dynamic circuit does not read the input signal until the input line has settled. It is very difficult to predict the amount of noise which will appear on an input line without actually measuring it. Therefore, the amount of delay margin added to a dynamic circuit due to predicted input noise must be very conservative.
For the above reasons, it becomes either impossible or impracticable to use a dynamic circuit for cross-unit comparator applications. If a dynamic circuit is used in such an application, it would require so much delay time margin that the resulting circuit would end up being slower than an equivalent static circuit. Static circuits, while generally slower in operation than dynamic circuits, do not require delay margins because a static circuit can recover from input signal noise and delayed inputs. A static circuit simply switches its output in real time to account for any switches at its input. A static circuit need not be reset because its output always reflects the logical result of signals applied to its input at any instant in time. For this reason, a static circuit may not only be faster than an equivalent dynamic circuit (once reasonable delays are added to the dynamic circuit) but also the static circuit may be designed only once for a particular cross-unit comparator application, and the same static circuit can be placed into any other similar comparator application with minimal redesigning regardless of the expected arrival times of the inputs in either application. Because the delay margin added to a dynamic comparator circuit in one application may be too much or too little for use in another application, one can not simply "pick and place" the same dynamic circuit in both applications. The circuit must be redesigned to specifically suit each individual application.
Finally, an ECL current sensing approach may be used to solve the problem associated with determining arrival times of inputs to a cross-unit comparator. However, such an approach is also not practicable because the outputs of ECL circuits are small signals. These small signals need to undergo ECL to CMOS translation in order for such circuits to be able to communicate with the CMOS logic used elsewhere in the device. Unfortunately, an ECL-CMOS translator adds significant delay to the circuit such that the ECL current sensing approach becomes too slow to be reasonably implemented in an advanced processor.
If the comparator circuit 25 of FIG. 1 is to be designed as a static circuit, the first stage of the circuit may be a standard XNOR gate 23. The XNOR gate 23 is actually a series of XNOR circuits arranged such that each XNOR circuit accepts one bit of data from input bit string 27 along with the corresponding bit of data from input bit string 28 and performs an XNOR function on the two bits to generate a result ("1" if the bits are the same, "0" if the bits are different). There are as many XNOR circuits in XNOR gate 23 as there are pairs of bits in the bit strings to be compared. So the output 29 of the XNOR gate is a single word in which each bit position represents the equivalent bit position in inputs 27 and 28, and is a 1 if the bit in the equivalent bit position in input bit string 27 is the same as the bit in the equivalent bit position in input bit string 28 and 0 if not. Therefore, when input bit string 27 is equal to input bit string 28, the resultant bit string 29 will consist solely of a string of 1's.
Block 24 in FIG. 1 is a miss detection circuit. The purpose of the miss detection circuit 24 is to determine if any of the bits in input bit string 29 are a 0. If any of the bits in input bit string 29 are a 0, this indicates that the equivalent bit positions in input bit strings 27 and 28 are not equal to each other, and hence, the bit string 27 is not equal to the bit string 28. If such is the case, miss detection circuit 24 will output a 1 indicating to the rest of the device that the two bit strings 27 and 28 from units 21 and 22 respectively do not match. Of course, an alternate interpretation is that miss detection circuit 24 determines whether or not all of the bits in input bit string 29 are 1. If all of the bits in input bit string 29 are 1, then bit string 27 is equal to bit string 28 and the miss detection circuit will transmit a 0 at output 26.
A conventional, static miss detection circuit 24 for use in the operation of the system of FIG. 1 is shown in FIG. 2. This miss detection circuit is equivalent to a single, large NAND gate. This particular miss detection circuit has a 24 bit input labelled b.sub.1, b.sub.2, b.sub.3. . . b.sub.23, b.sub.24. In the first stage of the miss detection circuit of FIG. 2, bits 1-4 are input to NAND gate 40, bits 5-8 are input to NAND gate 41, bits 9-12 are input to NAND gate 42, bits 13-16 are input to NAND gate 43, bits 17-20 are input to NAND gate 44, and bits 21-24 are input to NAND gate 45. In the second stage of the miss detection circuit of FIG. 2, the outputs of NAND gates 40, 41, and 42 are coupled to the input of NOR gate 46 while the outputs of NAND gates 43, 44, and 45 are coupled to the input of NOR gate 47. Finally, in the third stage of the miss detection circuit, the outputs of NOR gates 46 and 47 are coupled to the input of NAND gate 48. The output of NAND gate 48 appears on output node 26.
Each of the 24 bit inputs b.sub.1, b.sub.2, b.sub.3. . . b.sub.23, b.sub.24 of FIG. 2 corresponds to the respective bit position in a 24 bit XNOR gate output 29 in FIG. 1. So, for instance, the bit value at bit position b.sub.3 in FIG. 2 will be a 1 if the bit value at the third bit position in the 24 bit input string 27 is equal to the bit value at the third bit position in the 24 bit input string 28. And the bit value at bit position b.sub.3 will be a 0 of the bits in the respective bit positions in bit strings 27 and 28 are not equal to each other. NAND gates 40, 41, 42, 43, 44, and 45 will only output a 0 if all four of their respective inputs are 1's. NOR gates 46 and 47 will only output a 1 if all three of their respective inputs are 0's. NAND gate 48 will only output a 0 at output node 26 if both of its inputs are 1's.
The result is that if any of the 24 bits present at the input to this miss detection circuit are a 0, indicating that the two bit strings being compared by the comparator circuit are not equal to each other at the respective bit position, then the first corresponding NAND gate will output a 1, the next corresponding NOR gate will output a 0 and the final NAND gate 48 will output a 1 at output node 26 indicating a miss. On the other hand, if every one of the 24 bits is a 1, indicating that the two bit strings being compared by the comparator circuit are equal to each other at every bit position, then the first six NAND gates 40-45 will all output O's, the next two NOR gates 46 and 47 will both output 1's and the final NAND gate 48 will output a 0 at output node 26 indicating a hit.
The reason a single, large NAND gate cannot be used to replace the multiple stage miss detection circuit of FIG. 2 is because the 24 bit input to the miss detection circuit far exceeds the fan-in limit of a conventional static NAND gate. Conventional static NAND gates are used in the miss detection circuit of FIG. 2, and the circuit diagram for NAND gate 40 is shown in FIG. 3. In FIG. 3 it can be seen that PMOS transistors 60, 61, 62, and 63 are connected in parallel such that the sources of these transistors are tied to Vdd while the drains of these transistors are coupled to base line 74. The gates of each of transistors 60, 61, 62, and 63 are controlled by the voltages corresponding to each of inputs b.sub.1, b.sub.2, b.sub.3, and b.sub.4 respectively. Base line 74 is coupled to the base of bipolar transistor 68 while the collector of bipolar transistor 68 is tied to Vdd and its emitter is coupled to output 73.
NMOS transistors 64, 65, 66, and 67 are connected in series with one another such that the drain of transistor 64 is coupled to base line 74 at one end of the series while the source of transistor 67 is tied to Vss at the other end of the series. The source of transistor 64 is coupled to the drain of transistor 65. The source of transistor 65 is coupled to the drain of transistor 66. The source of transistor 66 is coupled to the drain of transistor 67. The gates of each of NMOS transistors 64, 65, 66, and 67 are controlled by the voltages corresponding to each of inputs b.sub.1, b.sub.2, b.sub.3, and b.sub.4 respectively. In similar fashion, NMOS transistors 69, 70, 71, and 72 are connected in series with one another such that the drain of transistor 69 is coupled to the output 73 at one end of the series while the source of transistor 72 is tied to Vss at the other end of the series. The source of transistor 69 is coupled to the drain of transistor 70. The source of transistor 70 is coupled to the drain of transistor 71. The source of transistor 71 is coupled to the drain of transistor 72. The gates of each of NMOS transistors 69, 70, 71, and 72 are controlled by the voltages corresponding to each of inputs b.sub.1, b.sub.2, b.sub.3, and b.sub.4 respectively.
The BiCMOS NAND gate 40 of FIG. 3 is representative of all the other NAND gates 41, 42, 43, 44, 45, and 48 in FIG. 2. Each of the four inputs b.sub.1, b.sub.2, b.sub.3, and b.sub.4 controls the operation of three transistors. Transistors 60, 61, 62, and 63 are p-channel transistors (PMOS), so in order for the supply voltage Vdd to be applied to the base of transistor 68, thereby pulling output 73 high, one of inputs b.sub.1, b.sub.2, b.sub.3, or b.sub.4 must be at a low voltage. On the other hand, if all of the inputs b.sub.1, b.sub.2, b.sub.3, and b.sub.4 are at a high voltage, supply voltage Vdd will not be able to reach the base of transistor 68 through any of transistors 60, 61, 62, or 63, and all n-channel transistors (NMOS) 64, 65, 66, 67, 69, 70, 71, and 72 will turn on thereby pulling base line 79 of transistor 68 low (turning it off) as well as output node 73.
There are several problems with this type of a logic circuit. First, note that output node 73 is pulled down by transistors 69, 70, 71, and 72 in series, and base line 74, coupled to the base node of transistor 68, is pulled down by transistors 64, 65, 66, and 67 in series. It can take a long time for a series connection of MOS transistors to charge or discharge a node. The more MOS transistors appearing in series between the node to be pulled down (discharged) and Vss, the slower the device switching speed of pulling down that node. Likewise, the more MOS transistors appearing in series between the node to be pulled up (charged) and Vdd, the slower the device switching speed of pulling up that node.
If one were to expand the circuit of FIG. 3 from a four input NAND gate to, for instance, a six input NAND gate, two additional p-channel transistors would appear in parallel with transistors 60, 61, 62, and 63 between Vdd and base line 74, two additional n-channel transistors would appear in series with transistors 64, 65, 66, and 67 between base line 74 and Vss, and two additional n-channel transistors would appear in series with transistors 69, 70, 71, and 72 between the output node 73 and Vss. Therefore, as the NAND circuit of FIG. 3 expands to accept more inputs, the number of MOS transistors appearing in series to pull down base line 74 and the output node 73 increases proportionately thereby slowing down the circuit's switching speed as discussed above. There comes a point when too many inputs incorporated into the circuit of FIG. 3 will either wholly frustrate the circuit's ability to sufficiently pull down the output node 73, or will make the device too slow to be practical. This is known as the fan-in limit and is typically about four inputs for a NAND circuit like the one depicted in FIG. 3.
Even with four inputs to NAND circuit 40, this circuit will still be too slow for advanced processing operations. Miss detection circuit 24 of FIG. 2 has three stages, six NAND gates in the first stage, two NOR gates in the second stage, and one NAND gate in the third stage. Since miss detection circuit 24 has multiple stages, execution of the second stage must be delayed until it is certain that all its inputs have been properly received from the first stage. Likewise, execution of the third stage must be delayed until it is certain that all its inputs have been properly received from the second stage. The only way to ensure that all the inputs to a stage have been received is to add enough delay margin to each stage so that even the worst-case (slowest) delay in propagation of data through a previous stage will be accounted for in a subsequent stage.
For NAND circuit 40, the worst-case delay in propagation of data through the gate will be when inputs b.sub.1, b.sub.2, b.sub.3, and b.sub.4 simultaneously switch from low to high voltages. When this occurs, output node 73 will only switch from a high to a low voltage once all the charge on base line 74 has been drained through series NMOS transistors 64, 65, 66, and 67 thereby turning off bipolar transistor 68. Since these NMOS transistors 64, 65, 66, and 67 collectively add quite a bit of resistance to the path between base line 74 and Vss, it will take a long time to drain all the charge from base line 74. Once the bipolar transistor 68 is no longer driving output node 73, the charge on output node 73 must be drained through series NMOS transistors 69, 70, 71, and 72. Again, since the series connection of NMOS transistors contributes to the resistance path to Vss, it will take a long time to drain all the charge from output node 73.
So each of NAND gates 40-45 may be expected to operate very slowly whenever all four of each respective NAND gates' inputs simultaneously switch from low to high. In a similar manner, NOR gates 46 and 47 may be expected to operate very slowly whenever all three of each respective NOR gates' inputs simultaneously switch. The switching speed or NAND gate 48 doesn't experience as much of a delay because it has only two inputs. The ultimate result is that the first and second stage of miss detection circuit 24 each contribute the equivalent of two gate delays to the overall miss detection circuit 24, while the final stage contributes only one gate delay (due to fewer inputs). So miss detection circuit 24 contributes a total of 5 gate delays to the comparator circuit 25 of FIG. 1.