In general, microprocessors (processors) achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle. The term “clock cycle” refers to an interval of time accorded to various stages of processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to a rising or falling edge of a clock signal defining the clock cycle. The storage devices store the values until a subsequent rising or falling edge of the clock signal, respectively. The phrase “instruction processing pipeline” is used herein to refer to the logic circuits employed to process instructions in a pipeline fashion. Although the pipeline may include any number of stages, where each stage processes at least a portion of an instruction, instruction processing generally includes the steps of: decoding the instruction, fetching data operands, executing the instruction and storing the execution results in the destination identified by the instruction.
Processor design consists of a central clock, generally phase lock loop (PLL) clock, with a clock tree network. The clock tree consists of many global clock buffers and local clock buffers. The clock buffers can be clock-gated to save power but the clock tree itself can still consume much power. In some estimate, the clock tree can consume 15% to 35% of total dynamic power of the processor. The distributed clock networks with local clock generators can significant reduce the power consumption of microprocessor as suggested in U.S. Pat. No. 5,987,620. Unfortunately, at system level, the clocking network is still inefficient with a single PLL clock or multiple PLL clocks. The globally-asynchronous-locally-synchronous (GALS) clocking allows the system modules to operate at different clock frequencies but these clock frequencies are still fixed by PLL clocks.