1. Field of the Invention
The present invention relates to data transfer, and more particularly to a system and method for transferring data through latches which reduces the number of latches, reduces power consumption and enables the latches to receive or transmit data only when an operation is to be performed.
2. Description of the Related Art
Interlocked pipelined complementary metal oxide semiconductor (IPCMOS) circuits and techniques are disclosed in U.S. Pat. No. 6,182,233, incorporated herein by reference. A paper describing the results of an implementation of these IPCMOS circuits on a test site is found in an article published in the ISSCC 2000 Digest of Technical Papers, Session 17, Logic and Systems, Paper WA 17.3, by Schuster et al. entitled xe2x80x9cAsynchronous Interlocked Pipelined CMOS Circuits at 3.3-4.5 GHzxe2x80x9d, incorporated herein by reference and hereinafter referred to as the ISSCC paper. In the ISSCC paper, asynchronous interlocked locally generated clocks drive a path through a 3 to 2 compressor tree of a Floating Point Multiplier (FPM) at frequencies as fast as 4.5 GHz in a 0.18 micron 1.5 Volt bulk CMOS technology. Power reductions greater than two times are estimated with these IPCMOS techniques.
In U.S. Pat. No. 6,182,233 referenced above, circuits and techniques are disclosed for asynchronously interlocking blocks in the forward and reverse directions that have extremely small overhead for handshaking. This makes very high performance possible.
Interlocked Pipelined CMOS circuits and techniques are also disclosed in commonly assigned U.S. application Ser. No. 09/746,647 to Cook et al., filed on Dec. 21, 2000 and entitled xe2x80x9cAsynchronous Pipeline Control Interface,xe2x80x9d (hereinafter referred to as Cook et al.). Cook et al. is incorporated herein by reference. Cook et al. includes circuits and techniques for asynchronously interlocking blocks in the forward and reverse directions that have extremely small overhead for the handshaking. This makes very high performance possible.
In conventional synchronous approaches a global clock activates all the latches simultaneously. Synchronous pipelines are typically subject to clock skew problems which may cause undesirable delays in the pipelines.
Referring to FIG. 1A, a master/slave latch 10 is employed to prevent data from logic stage 11 from propagating through latch 10 before a logic stage 12 is ready to act on the data. Master/slave latch 10 includes a master latch 18 and a slave latch 20. Master latch 18 empties data into slave latch 20 in accordance with global clock signals. Switches 14 and 16 of latch 10 are enabled by global clock pulses C1 and C2, respectively, to transfer data (Data) across latch 10 as shown in FIG. 1B which shows a timing diagram. Unfortunately, the master slave approach has to deal with clock skew and jitter and consumes more power in the clocking to drive both the master and the slave latches.
Referring to FIG. 2A, another approach is to split a logic stage into portions 22 (preferably split in half in accordance with delay (i.e., one half the delay for each portion 22)) and place a latch 24 and a latch 26 such that latches 24 and 26 are split between the logic stages 22. Switches 14 and 16 of latches 24 and 26 are enabled by global clock pulses C1 and C2, respectively, to transfer data (Data (a and Data (b)) across the latches as shown in FIG. 2B which shows a timing diagram. This reduces the problem of dealing with clock skew and jitter, but since the number of latches is the same as in the master slave approach of FIG. 1A, the clock power is not reduced. In fact, there will be additional power consumed by this approach since inputs which are connected to the logic 22 receive data before the logic stages 22 attain their final values. This will result in a higher logic switching factor. In addition, both the approaches of FIGS. 1A and 2A consume power whether or not there is an operation to perform as a result of the continuously running synchronous (global) clock.
Therefore, a need exists for latch circuits and methods of operating the latch circuits which reduce the number of latches and/or clock loading, consume power only when there is an operation to perform and achieve higher speed compared to existing approaches.
Circuits and methods for operating a latch structure are disclosed. The circuits include a plurality of stages, and each stage includes a first logic circuit, a latch coupled to a second logic circuit of an adjacent stage and a switch which connects the first logic circuit to the latch in a first state and disconnects the logic circuit from the latch in a second state. A local clock circuit controls the first and second states by providing a locally generated clock signal to activate the switch. The locally generated clock signals are generated by interlocking handshake signals from a local clock circuit of an adjacent stage.
A method for transferring data in an interlocked pipeline circuit having a plurality of stages includes providing, for each stage, a latch connected to an input of that stage and a switch for selectively coupling the input of the stage to an output of the previous stage. When the data is valid in a current stage, a valid signal is sent to a local clock circuit of a next stage of the plurality of stages. An acknowledge signal is sent from the local clock circuit of the next stage to a local clock circuit of the current stage responsive to the valid signal. A local clock signal is generated at the local clock circuit of the current stage of the plurality of stages based on the acknowledge signal and the valid signal. The switch of the current stage is enabled based on the local clock signal to permit data transfer to the latch of the current stage from the output of the previous stage.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.