1. Field of the Invention
The present invention generally relates to microprocessors, and more particularly to a microprocessor implemented with dynamic logic wherein control logic is implemented in dynamic programmable logic arrays.
2. Description of Related Art
The market for microprocessors is demanding increased processing power, which can be achieved through increased clock frequencies and complexity (for example parallel operation capacity). Increased complexity creates the need for increased circuit density, and use of architectures requiring fewer devices to implement the microprocessor circuits.
One technology that has been applied in recent years to an increasing number of microprocessor and other integrated circuits, is dynamic logic, sometimes referred to as xe2x80x9cDomino Logic Circuitsxe2x80x9d. Although these circuits are very efficient from a device per gate standpoint, due to the dynamic nature of the logic (logic signals exist as pulses propagating through a logical network, rather than static stages), design of large scale integrated circuits can become very complex with respect to timing. As clock frequencies increase, timing problems increase, particularly with respect to the interconnection of various functional sub-systems that are displaced from each other within a monolithic circuit. In the past, at low frequencies, focus was placed on local timing issues and blocks could be interconnected easily, as the cycle time of the system was long compared to the propagation delay across the die. With present processor frequencies on the order of 1 gigahertz, the timing constraints on the interconnection within a monolithic circuit becomes significant.
At the heart of a typical microprocessor is the control logic, implemented as xe2x80x9cmicrocodexe2x80x9d, wherein Read Only Memory (ROM) is used to implement a state sequencer via feedback connections and wherein dataflow elements are coupled to the control logic. The output of the control logic provides the next address that will be issued to access the microcode. ROM is not the most efficient way to implement control logic, as a separate row of hard-wired values is required for each combination of input address lines. The control logic in a microprocessor has to synchronize the operation of various execution units such as load/store units, pre-fetch units, floating point and arithmetic units, and instruction decoding. In order to integrate the operation of all of these functional units within a processor operating at one gigahertz or more, the timing of the signals provided to the control logic and the outputs of the control logic used to create the next machine state must be carefully controlled.
Some modern high clock frequency microprocessors use custom designed logic to perform control logic functions, rather than using a ROM approach. The difficulty in this approach is that the synthesis of the logic will yield timing variations. These logic variations can be cured by insertion of delays, but this may not provide the most optimum performance. The approach is also iterative, which requires adjustments to the entire logic network when the timing of a node on which other logic depends has been changed. In addition, design changes that are made during the development and evolution of a microprocessor require a complete re-evaluation of the timing paths.
The control logic controls data flow, bus operations, and next state sequencing in the microprocessor. Because data from data flow elements must be stabilized at some point in the processor cycle in order to use the data, and next state values must be stabilized at some point in the cycle to reliably sequence the control logic, latches are typically used and timing is typically controlled to prepare and hold this information. This limits processor speed and uses power, since latches use higher power than many other blocks, and the set-up and hold times for the latches constrain the processor speed, as all signals must propagate and remain valid for the set-up and hold time of the latches.
Part of the complexity of timing design is created by increased microprocessor die area. Propagation delays from various parts of the microprocessor have increased, making it difficult to align data flow and control information so that data manipulation and next state sequencing can proceed properly. This is generally the upper limit on clock frequency for a given processor design.
A second part of timing constraints is device skew. With a synthesized control logic design, and data flow components that are not equal in propagation delay, the latching of data and control provides equalization for device skew. This is another effective upper bound on clock frequency. Therefore, it would be desirable to produce an improved control and dataflow logic for a microprocessor such that circuit power and size can be decreased, while providing high frequency operation. It would be further desirable to provide a means for using dynamic logic in a microprocessor such that timing of the interaction of blocks across the die and with varying device skews can be simplified.
The above objectives are achieved in a microprocessor having control logic block implemented solely in programmable logic arrays. The microprocessor has a plurality of processing blocks for performing pipelined operations, a control logic means made solely from programmable logic arrays for operating each of the processing blocks by decoding a last state, and pulse stretching means coupled to the outputs of the programmable logic arrays for synchronizing the outputs.
The microprocessor may further include a multiplexer latch coupled to the inputs of the dynamic programmable logic arrays, so that the control logic outputs can be combined with data and comparison results.
The processor may also include a dataflow block for providing operand data, and a multiplexer latch is coupled to said dataflow block for providing operand data as an input to the programmable logic arrays. The processor may further include a comparator block for comparing operand data and immediate values.
The programmable logic arrays may use dummy devices to balance the loading of the input plane, and may have a sub-divided output plane for decreasing propagation delay.
The pulse stretching means may include preset means for generating a preset strobe from one edge of a system clock and means for stretching outputs until the assertion of the preset clock so that a change in state computed at any time during a cycle of the preset clock until the assertion of said preset strobe can be provided to an output.
The invention also includes a method for implementing a microprocessor including the steps of determining a high-level description of a logic network required to decode and execute operands, synthesizing the logic network in programmable logic array form, and fabricating the microprocessor using at least one programmable logic array as the sole implementation of said logic network.
The method may further determine that an output plane of a programmable logic array has a propagation delay that is greater than the sum of propagation delay of an output combining means plus the propagation delay of a divided programmable logic array output plane and responsive to the determination, dividing the output plane into partial result conductors, and coupling the partial result conductors using an output combining means.
The method may also determine that at least one given programmable logic array input signal line has a greater number of attached contribution devices attached than the number of contribution devices attached to another of the array input signal lines, and connecting at least one additional device as a loading device to least one other array input signal line having a lesser number of attached contribution devices.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.