1. Field of the Invention
2. Description of the Prior Art
Many electronic systems, especially but not limited to digital electronic systems, are composed of a cascade of modules where each module receives input data from one or more previous modules in the cascade, processes the received data, and then sends output results to one or more following modules in the cascade. This architecture is sometimes referred to as a pipelined system or a pipelined circuit and can be applied at the system level, the subsystem level, the component level, or even at the subcomponent level, such as inside an integrated circuit. If all modules are required to work synchronously with all other modules, a clock signal must be distributed to all modules in the system. The normal method used to distribute the clock is to systematically split the signal using buffers to form a clock distribution tree. This ensures that each module receives a clock signal that is in phase with the clock signal received by all other modules, allowing all modules to perform their processing functions at the same time.
There are two major problems with this method of splitting and distributing the clock signal. First, it requires a large number of high-power clock distribution buffers. The large number of high-power buffers consumes a large amount of electrical power, generates a large amount of heat, takes up a significant amount of space, and increases system complexity and component count, which decreases reliability and increases cost. Second, the use of a clock distribution tree does not guarantee that all modules will receive an in-phase clock signal. In fact, it practically guarantees there will be at least some skew between the different clock signals arriving at the different modules. The skew is caused by the normal variations in length and parasitic resistance, capacitance, and inductance in the different conductors that distribute the clock signal, along with the normal delay variations in the different clock buffers. These variations exist even if all conductors and clock buffers are implemented on the same integrated circuit using a physically and electrically symmetric layout. Furthermore, the more modules in the system, the more levels will be required in the clock distribution tree, and the greater will be the skew. It is important to note that all of these major disadvantages exist whenever a pipelined architecture utilizes a synchronous clock, regardless of whether the architecture is applied at the system level, the subcomponent level, or at any level in between. Unfortunately, the vast majority of pipelined electronic systems require a clock for synchronization.
An alternative method, known as counter clock flow pipelining, or, alternatively, counter clock pipelining or counter flow pipelining, for distributing the clock signal in a pipelined system or circuit has been developed. In this alternative approach the clock is distributed using multiple clock distribution buffers that are connected in a daisy-chain arrangement instead of a tree arrangement. The clock is initially distributed to the last module or circuit in the cascade of modules. The clock signal is then routed through a buffer and distributed to the preceding module in the cascade. The clock is distributed to all modules or circuits in the pipeline using this daisy-chain technique, always being distributed to a following module or circuit in the cascade before being distributed to a preceding module or circuit in the cascade. It should be noted that the distribution of the clock signal, or the clock flow, occurs in the opposite direction of the flow of data through the cascade of data processing modules, thus the name “Counter Clock Flow Pipelining.”
There are four major advantages and one major disadvantage to using the counter clock flow method for distributing the clock signal in a pipelined circuit or system. The first advantage is that the clock buffers do not have to drive long clock lines that span large distances across a printed circuit board or an integrated circuit. Each buffer only has to drive the distance from one module to the next. Therefore, the clock buffer circuits do not need to be as powerful as do the clock buffer circuits used in a tree arrangement. In fact, the total amount of electrical power consumed by the counter clock flow clock distribution circuit is typically 30% less than the amount of electrical power consumed by the clock distribution circuit in a tree arrangement. Furthermore, each individual buffer circuit is physically smaller, thus providing a size advantage as well as a power consumption advantage, which can be critically important issues if all components are implemented on a single VLSI integrated circuit. Associated with the lower power consumption is also a reduction in the amount of heat generated by the clock distribution network. This can also be a significant factor for systems where all components are implemented on a single chip.
The second advantage of using counter clock flow pipelining is that it absolutely guarantees correct timing between the clock signals that arrive at adjacent modules in the pipeline. For correct operation of a pipelined system or circuit, if module A feeds data to module B, then the clock signal must arrive at module B either at the exact same time as the clock signal arriving at module A, or the clock signal must arrive at module B slightly before the clock signal arrives at module A. In a counter clock flow pipelined system or circuit, the clock signal is guaranteed to arrive at module A after it arrives at module B because of the finite delay through the clock buffer and the propagation delay along the clock wire. The more traditional clock fan out tree is an attempt to force the clock signal to arrive at all modules at exactly the same time. However, in practice, this is impossible because all clock buffers have slightly different delays, even when implemented on the same integrated circuit. Furthermore, the different wires that carry the different clock signals between the different buffers, or between the buffers and the processing modules, will also have slightly different delays. This is not a problem if the total delay from the clock input to module A is greater than the total delay from the clock input to module B. However, in the tree arrangement, this cannot be guaranteed.
The third advantage of using counter clock flow pipelining is that the output data from the last stage of processing is guaranteed to be synchronized with the incoming clock signal. This is because the incoming clock signal is immediately applied to the last module in the cascade of modules, without going through any delay-causing buffers. If the output data of the pipelined system or circuit is to be applied to another circuit or subsystem module that is being synchronized by the same clock signal, then it is critical for the output data to be synchronized with the incoming clock signal. If they are not synchronized, a wide variety of difficult problems can occur, such as switching hazards, races, and metastability problems. All these problems can be avoided by keeping the output data from the last stage of the pipelined system or circuit synchronized with the incoming clock signal. However, this can be very difficult to accomplish using a tree structured clock distribution scheme.
The fourth major advantage is the elimination of current surges on the power and ground supply rails. In a traditional pipelined system that utilizes a traditional clock fanout tree where are modules are clocked at exactly the same time or at nearly the same time, a large surge of current flows through both the power and ground supply rails when the clock signal transitions. This can cause a large number of problems such as noise margin degradation, cross talk, and timing and skew problems. With counter clock flow pipelining, no two modules in the pipeline are clocked at exactly the same time, limiting the power and ground rail surge current to the amount of surge current drawn by a single module.
There is one significant disadvantage to the clock distribution method. The data input signal coming into the first data processing module in the pipeline is not synchronized with the clock signal that is applied to that module. This is because the input clock signal applied to the pipeline goes through a large number of clock buffers before it is applied to the first data processing module in the pipeline. These buffers cause delay, as do the wires that interconnect the clock buffers. Furthermore, the exact amount of delay is very difficult to predict in advance. When the data input to the first data processing module is out of phase with the clock signal applied to that stage, a wide variety of difficult problems can occur, including switching hazards, races, and metastability problems. The only know solution to this problem is to manually measure the skew after implementation and manually adjust the skew by adding delay at the clock input to the pipeline. However, this causes skew between the output data from the last module in the pipeline and the input clock signal. Essentially, the skew problem between the input data to the pipeline and the clock signal applied to the first module works against the skew problem between the output data from the last module in the pipeline and the input clock signal. This disadvantage is so significant that it has prevented the widespread adoption of the counter clock flow pipelined architecture.