1. Field of the Invention
This invention relates to digital designs (such as microprocessors and computer systems) and, more particularly, to mechanisms and techniques to generate and utilize controlled distributed clocking mechanism in digital designs.
2. Brief Description of the Related Technology
Most digital designs of today, such as microprocessors, are based on synchronous design methodology. The term "synchronous design" generally refers to the method employed to control the timing of the design. An external clock (a signal with deterministic period of state change) generally controls the time at which the events are executed within a design in a very deterministic fashion. Either the external clock or a derivative of this clock is distributed in a disciplined manner throughout the chip. All timed elements in the design use this centralized clocking mechanism for their operation. This guarantees the time synchronization of various elements within the design. Most microprocessors of today use this methodology and there is a wealth of Computer Aided Design (CAD) tools and verification tools and methodology to support this.
To achieve higher performance, computers are running at high clock frequencies. It is projected that the clock frequency would reach the gigahertz range by the end of this century. As the frequency increases, the clock period decreases. The term "clock period" refers to the interval of time between, say, the two raising (or falling) edges of the clock signal. Generally, this is the time available to various elements of the design to perform their defined tasks. At high frequencies, this time is quite small--in the order of nanosecond. In the centralized clocking mechanism described above, certain percentage (10-15%) of this precious clock period needs to be allocated for clock skew and jitter thus reducing the useful work time. The term "clock skew" refers to the time difference between same clock edges at different part of the circuit. To reduce the effects of this, special attention is paid in the design to buffer and route clocks as high priority signals.
A clock traditionally has two transitions in a clock period. One when it transitions from say low level to high level (known as raising edge) and the other when it transitions back to low level from high level (known as falling edge). The time at which a transition occurs within a clock period defines the term "clock phase". Traditionally, designs have at most two clock phases available to them.
Most commonly, the clock signals convey only timing information. For the most part, they do not convey any functional or control information. It is becoming common for the clock tree to account for 25-35% of the total power consumption in a high performance microprocessor. The term "clock tree" refers to clock signals, their routing channels, and the buffers associated with the clock in the circuit. It also accounts for the major portion of the harmonic noise emitted by the device. Some of the designs such as microprocessors by Advanced RISC Machine (ARM) use gated clocks to various elements in the circuit to reduce the power dissipation. The term "gated clock" refers to conditionally allowing the input of a block or unit to change with the clock. This controls the operation of the unit to some extent. However, it still does not account for clock tree itself. In some cases, this could introduce delay due to gating function generation. Several other microprocessors such as Intel's Pentium series add several low power modes during inactive phase to reduce power dissipation. Special low power modes have overhead delays associated with them for entry and exit. Also, they do not help in reducing "active power" dissipation. The term "active power" refers to power dissipated when a design is in its normal mode of operation.
Asynchronous design methodology can solve most of the problems associated with the synchronous design methodology as they do not have central clock. Request-and-acknowledge handshake protocols are used to communicate between internal units. To date, most of the asynchronous microprocessors are academic in nature. This methodology introduces many problems due to non-deterministic result generation. Asynchronous design methodology introduces an entire set of new problems associated with design verification, testing, and operation (or interface) with other devices in the system. As the result generation is at the internal pace, externally it is not possible to determine when to expect it. Any glitches can result in incorrect operation of the device. For synchronous designs, results are evaluated deterministically with the clock. However, in asynchronous design, it can happen any time introducing a whole set of verification parameters.
To increase the throughput, high performance devices such as microprocessors have traditionally used the concept of "pipelining". The term "pipelining" refers to subdividing an operation into multiple serial functions. When the first operation passes through the first functional logic and enters the second functional logic, it allows the next operation to use the first functional logic. In synchronous designs, storage elements (e.g., registers) capture the value of the first operation according to a rising or a falling edge of a clock signal allowing next operation to enter the functional logic. Thus storage elements provide time isolation between logic of different functional units. The concept of pipelining allows the device to operate at much higher frequency, thus increasing the throughput and the performance. As the clock periods are shrinking, the pipeline registers have started to account for 10-15% of the clock period. Also, in a complex design, the pipeline registers can account to 10% of the total die area.
Traditionally, centralized clock provides two timing points within a period (one positive-edge--when the signal switches from logical zero to one--and another negative-edge). The duty cycle (the ratio of logic one time to logic zero time) of the clock is fixed throughout the design. Traditionally, microprocessor designs tend to use both edges (or phases) of the clock. This tends to double the clock skew and jitter problem in the centralized clocking scheme. Disadvantageously, the limit of two timing reference points in a clock cycle and fixed duty cycle restricts design alternatives.