1. Field of the Invention
The invention relates to data processing apparatus, and more particularly to control mechanisms by which data and control paths having a path delay of greater than one clock cycle may be accommodated in order to avoid the necessity of adding redundant latch points close to the functional units in which they are needed.
2. Description of Related Art.
Typical data processing apparatus can be thought of as being divided into data paths and latch points. A latch point is any component that provides an output on the basis of system conditions existing on a particular edge of a clock signal. The system conditions may include either the logic level of various signals coupled to inputs of the latch point, or the previous state of the latch point. Latch points may be implemented using, for example, edge-triggered or master-slave flip flops or registers.
Data paths are the connections by which data and/or control signals pass between one latch point and another or the outside world. As the term is used herein, a data path may include various combinational logic circuits. For example, two source latch points, which provide alternate sources for data to be received by a destination latch point, may each be coupled to the destination latch point through a single data selector. In response to a control signal, the data selector selects one of the two source latch points to provide the data to the destination latch point. In this situation, the path from each of the two source latch points, through the data selector, to the destination latch point constitutes a separate data path. Additionally, it will be understood that the terms "data" and "control signal" are frequently used interchangeably, such that the term "data path", as used herein, can carry either data signals or control signals or both. Thus, the path from the latch point containing the control signal, through the data selector, to the destination latch point constitutes an additional data path.
A data path has associated with it a path delay, which is the amount of time required from the clock signal edge on which data is latched into a source latch point, until the data reaches the input of a destination latch point with sufficient stability such that if a clock edge occurred at that time, the data would be latched into the destination latch point accurately. A path delay may be caused by long wires, by chip crossings, by logic delays inherent in combination circuits at intermediate points in the data path, by the clock skew between the source and destination latch point, or by the maximum latch-up time of the latches used. Additionally, some implementations of a latch point will recognize the system conditions on the rising edge of a clock pulse, but transfer such conditions to the outputs only on the falling edge of the clock pulse. For such implementations, the path delay includes the half-cycle between the rising and falling edges of the clock signal.
Latch points and data paths are typically grouped together in specific combinations to form functional units of the data processing apparatus. Such functional units may include, for example, a divider complex, a register complex, etc. Depending on where the lines are drawn, a functional unit may be considered to include or exclude various latch points or data paths, or parts thereof. For example, a multiplier complex functional unit may include an adder functional unit within its boundaries. It also may or may not be considered to include ingating and outgating combinational logic.
The various latch points and combinational logic circuits typically operate under the control of a control unit. A control unit may be microcoded or hardwired, and it generates control signals which are updated once every clock cycle. Programmed signals may be presented to the control unit from an Instruction Unit, for example, which tells the control unit to have a certain machine level instruction executed. In general, the cycle time can be no faster than the longest data path between two latch points under the control of the control unit. Otherwise, the usual practice is either to slow the clock frequency, which degrades the throughput of the entire computer system, or divide the longest data paths into two or more segments by inserting additional intermediate latch points.
In order to maintain fast cycle times, the prior art teaches that the various latch points which are needed for a given operation should be placed as close as possible to (or within) the functional unit which performs that operation. For example, for a multi-step multiplier unit, the prior art teaches that the multiplicand should be brought into a local register in order to keep the path delay between the multiplicand register and the other latch points in the multiplier unit short, to thereby minimize the cycle time required. One problem with this philosophy is that, taken to its extreme, it would require all latch points to be placed near each other for speed. This is obviously impossible to achieve.
Additionally, LSI real estate is still at a premium in those technologies employed in high speed data processing systems. A redundant local register can occupy as much as 20% of the space used to implement the functional unit. The designer is therefore forced to choose between long path delays and consequently long cycle times on one hand, or extra LSI real estate to hold redundant local registers on the other hand.