1. Field of the Invention
This invention relates generally to clock generation and more specifically, to internal custom clock frequency generation.
2. Description of the Related Art
Designs of advanced superscalar microprocessor architectures require reliable timing relationships between the various frequencies propagated within the superscalar microprocessor architecture. An example of one superscalar microprocessor architecture is shown in FIG. 1, which is a diagram of a computer 100 illustrating a central processing unit (CPU) 110 with a microcore (core) 120. Computer 100 can have multiple CPUs 110 connected to one or more memory 160 elements via a system bus 170. Typically, memory 160 compositions include DRAM, SRAM or flip-flops, which function as storage for data and instructions. Within CPU 110, one or more cores 120 use an interconnect 150 to transfer data and instructions to a cache 130. Further, a control logic 140 uses interconnect 150 to control the flow of data and instructions within CPU 110.
Typically, current CPUs 110 operate at higher frequencies than other motherboard components, such as hard-wired device drivers and memory 160. Consequently, motherboard designs incorporated methods to manipulate the different frequencies to enable proper motherboard operation. Similarly, internal elements of cores 110 operate at multiple frequencies. Current optimal superscalar microprocessor architectures insert delay circuitry, which forces faster, higher frequency component to wait, while slower, lower frequency components process data and instructions.
FIG. 2 is a diagram illustrating elements of core 120 (FIG. 1) in CPU 110. Core 120 can include an instruction cache 210, an instruction fetch unit (IFU) 220, multiple integer execution units (IEU) 230 and multiple floating-point graphics units (FGUs) 240. An FGU interconnect 250 connects the output from FGU 240 to IFU 220. Typically, IFU 220 retrieves data and instructions from instruction cache 210. If necessary, core 120 can also retrieve data and instructions from cache 130 and memory 160. Each IEU 230 includes an arithmetic logic unit (ALU) for computation in addition to other logic elements. Further, each IEU 230 connects to one FGU 240. Within each FGU 240 is a multiplier pipeline and an adder pipeline, which perform floating-point arithmetic and other graphics computations. Ultimately, output from FGU 240 travels via FGU interconnect 250 for use by IFU 220.
One problem with the design illustrated in FIG. 2 is the one-to-one relationship between IEU 230 and FGU 240. Because each FGU 240 has a multiplier pipeline and an adder pipeline, core 120 uses eight pipelines. As the number of IEUs 230 increase to exploit parallel computation, the number of FGUs 240 correspondingly increase. This results in increased circuitry in core 120 and CPU 110. A possible solution to the problem of increasing circuitry is to remove elements within core 120. However, while this solution reduces circuitry, another problem results.
Each IEU 230 synchronizes operations to the CPU system clock. In the example shown on FIG. 2, during one clock cycle, each IEU 230 sends output to an FGU 240. Upon removal of an FGU 240, only three FGUs 240 remain to process four IEU 230 outputs. Similarly, upon removal of another FGU 240, only two FGUs 240 remain to process four IEU 230 outputs. This requires the remaining FGUs 240 to delay the IEU 230 outputs to handle each output separately.
Accordingly, what is needed is a solution to reduce circuitry on a core 120 while adhering to the goals of designing optimal superscalar microprocessor architectures.