1. Field of the Invention
The present invention relates to a predecoder for decoding the address of a memory register in a register file, and more particularly, to a predecoder for converting n pairs of true/complement address inputs into 2n memory select lines for selecting banks of memory registers of the register file while also minimizing the number of required pull-down FETs in the decoder.
2. Description of the Prior Art
Recently, floating point processors have been designed which allow concurrent execution of a floating point multiply, divide, add and load or store instructions, thereby significantly increasing the processing efficiency of a floating point processor. For example, DeLano et al. describe in an article entitled "A High Speed Superscalar PA-RISC Processor", Proceedings of the Compcon Spring 1992Digest of Papers, San Francisco, Calif., Feb. 24-28, 1992, a central processing unit comprising an integer processor and a floating point coprocessor which achieves exceptional performance and structural density. The floating point coprocessor consists of a register file, a floating point ALU, a floating point multiplier, and a floating point divide/square root unit and is integrated onto the same chip as the integer processor. The speed and density characteristics of such a circuit was exploited by implementing a system of dynamic, self-timed logic.
Self-timed logic or so-called "mousetrap" logic is distinguished by the generation of glitch free signals of the type described by Yetter in U.S. patent application Ser. No. 07/684,720, filed Apr. 12, 1991, now U.S. Pat. No. 5,208,490 and assigned to the same Assignee as the present invention. As described by Yetter in that patent application, "mousetrap" style logic circuits are timed by transitions in the data itself rather than clock edges. Such a self-timed system implements logic paths for encoding respective "vector logic states" which are specified by collectively conceptualizing the individual logic states or "vector components" on the logic paths. In particular, an "invalid" vector logic state is defined as the case when all vector components are at a logic low (a logic "0" or low electrical signal level). On the other hand, each of the "valid" vector logic states is specified via a variety of schemes such as one in which one and only one of the vector components of a vector logic state exhibits a logic high (a logic "1" or high electrical signal level). Encoding of the vector logic states can then be handled by defining a valid vector logic state by more than one logic path while still defining an invalid vector logic state when all logic paths exhibit a low logic level.
The present inventor set out to build a register file for a system of the type described by Delano et al. for use in a floating point data path comprised of logic circuits requiring a periodic electrical pre-charge phase and an evaluate phase in order to maintain and properly perform the intended logic function. Since the register file of Delano et al. has thirty-two 64-bit registers (four registers are reserved for floating point exception data) and 5 read ports and 3 write ports to allow concurrent execution of a multiply, an add and a load or store, it was the goal of the present inventor to design a register file which maximizes speed while minimizing the area required for implementing the 32 registers in the register file and in particular the address decoders used for accessing the 32 registers in the register file.
In the register file of the type described by Delano et al., thirty-two 64-bit registers are provided, where each register has 8 ports (3 write and 5 read). There is thus a total of 32*8=256 address decoders in the register file, where each decoder selects 1 of 32 registers. As a result, 5 pairs of true/complement address lines are needed for each port to uniquely specify which of the registers is to be read from or written to. Generally, the decoder outputs are used to select a register for reading or for writing. A register is written during a first clock and read on a precharged bus during a second clock. Thus, during the first clock, a glitch-free register write enable must be provided by the write address decoders to the register file. Such a system for use in conjunction with floating point exception flags is described, for example, by Mason et al. in U.S. patent application Ser. No. 07/899,202, filed Jun. 16, 1992, now U.S. Pat. No. 5,257,214 and assigned to the same Assignee as the present invention. As described therein, if the write enable is allowed to glitch, the register contents can be disturbed. Likewise, during the second clock, glitch-free register read signals must be provided by the read decoders to the register file. As in the case of the write enable, if the read enable is allowed to glitch, the precharged register output can be disturbed. Domino-style logic has been used to provide such glitch-free operation.
As illustrated in FIG. 1, a known 5 input AND gate domino decoder 100 may be used to select 1 of the 32 registers based on 5 read port address inputs. As shown, 5 input AND gate decoder 100 consists of a PFET precharger 102 responsive to the input clock signal CK, 5 NFET transistors 104-112 in the NAND pull-down string, and an output inverter 114. The respective NFET transistors 104-112 receive 5 true/complement address pairs ADDR0-ADDR4 and NADDR0-NADDR4 as illustrated such that 10 wires must be routed through the decoder stack. Each address line is loaded by 16 of the 32 address decoders.
In the embodiment of FIG. 1, since no predecode is provided, the pull-down circuit has 5 NFETs 104-112, one for each true/complement address. Such long pull-down strings, in addition to having more transistors, also require more area for each transistor, for in order to have the same effective width/length ratio as a single pull-down transistor, each FET needs to be roughly five times wider than a single transistor. Hence, the decoder 100 illustrated in FIG. 1 takes up a relatively large amount of chip space. A smaller and faster decoder is desired.
The domino-style logic used in the embodiment of FIG. 1 also is not free from charge sharing problems, especially for long pull-down strings. During the precharge phase, nodes internal to the NAND pull-down string may not be precharged high because the inputs are low and hence the intermediate nodes are isolated. This can allow zero logic levels to be trapped on the intermediate nodes. Charge sharing can occur when the topmost transistors in the pull-down string turn on but the bottommost transistor is off. For long pull-down strings, internal nodes can be pre-charged to eliminate such charge sharing problems. However, the addition of these prechargers has obvious area disadvantages. It is thus desirable to shorten the pull-down string of the NAND decoder 100 so as to minimize the effects of charge sharing which would otherwise occur when domino-style logic is used.
Accordingly, an improved decoder for a register file is desired which can provide glitch-free read and write enables to the register file without the necessity of a long pull-down string of the type used in the prior art. The present invention has been designed to meet this need.