Recent years have seen phenomenal growth and development in the field of digital data processing, both with respect to component structures and system implementations. Important aspects of that development have involved increased operating speeds and simplicity of design. In general, the present invention is directed to an improved data processing system that is pertinent to both the criteria of speed and simplicity of design. The system is asynchronous, operating without clock signals. Accordingly, systems of the present invention can function at speeds determined by the circuitry within the system.
An embodiment of the present invention fashioned as an asynchronous pipeline processor is disclosed herein. As disclosed, the asynchronous pipeline processor is proposed to incorporate elements of high-speed memory.
In a specific form, the system of the present invention may be implemented as a high-speed multiplier. Generally, two main types of high-speed multipliers are in common use. The first type is called the combinatorial or array multiplier. In such multipliers, a combinational circuit free of data storage elements is used to form the product of two input numbers, i.e., a multiplier and a multiplicand. Consider an operating system.
The numerical values may be represented in a binary format, each being represented by N bits. A suitable array for developing the product of the two values may contain N binary adders that feed values to each other in sequence. Each adder either adds zero or a suitably shifted version of the multiplicand to a partial product developed by previous adders in the system. Whether a particular adder adds zero or the multiplicand value depends on the current bit from the multiplier.
As traditionally implemented, combinatorial multipliers lack sufficient operating speed for certain applications. That is, while a combinatorial multiplier produces answers faster than a single adder (since a single adder must perform N separate additions in sequence) the combinatorial multiplier is still too slow in certain applications. Roughly speaking, the time it takes to pass information through the array of a combinatorial multiplier is equal to the time it takes for signals to progress through the carry chain of a single adder plus the time it takes for signals to progress through the least significant bits of all N adders, without regard for carry propogation.
An N by N array multiplier can be designed in which the operating time approximates 2 N gate delays. By using Booth's well-known algorithm to reduce the number of adders required from N to N/2, an array multiplier can be designed with a time delay of about 1.5 N gate delays. Other techniques for increasing speed can be implemented to reduce the time delay somewhat further (however, such designs tend to become very complex).
Improved performance in digital multipliers is offered by an alternate prior-art form, the synchronous pipeline multiplier. Such multipliers are characterized by storage elements in the array of adders to hold partially developed answers. The storage elements receive a common clock so that signal-represented data values march through the multiplier to the timing of a clock. As the number of storage elements are increased in such multipliers, the clock rate speed can increase because the data signals need pass through less circuitry between storage elements. Unfortunately, the number of clock pulses required to move data (from input to output) through the increased number of storage elements also increases. Thus, a basic dilemma occurs in the design of synchronous pipeline multipliers. As more storage elements are introduced, the amount of logic (and thus the time between elements) decreases; accordingly the clock rate can be increased. But unfortunately, as more storage elements are introduced, the number of clock pulses required to move data through the entire system (input to output) also increases. Consequently, the time to process an isolated value may also increase.
Moreover, in a prior-art synchronous pipeline multiplier, the speed at which the clock can run depends on the slowest operating speed of any stage in the pipeline. Consequently, as a design consideration, the fact that the slowest unit must establish the speed of the clock necessitates accurately computing the speed of each stage. In systems utilizing integrated circuits, in which operating speed can vary from batch from batch, and where speed may depend on specific physical layouts of circuit parts, accurately computing operating speeds can be very difficult.
In general, the system of the present invention simplifies the design of pipeline data processing systems. Specifically, an asynchronous pipeline processor in accordance herewith is easier to design because each stage can proceed at its own pace independent of any external clock signal. The design will produce correct answers independently of the speed of individual stages, providing the control part of each stage functions faster than the data part. That is, in the system as set forth below in detail, the control elements for each stage function faster than the data processing cells for each stage.
Although the overall speed of an asynchronous pipeline processor in accordance herewith is still set by the slowest element, the burst speed for both input and output functions can be higher than that of the slowest element if faster stages of the pipeline proceed and follow the slowest stage. Accordingly, consideration might well be given to selective arragement of individual processing apparatus.
In general, the system of the present invention provides a useful alternative to traditional processors, specifically combinatorial array processors and synchronous pipeline processors. The system of the present invention generally retains the advantages of each system and substantially avoids the disadvantages. In operation, an asynchronous pipeline processor in accordance herewith behaves like a combinatorial array processor when it is empty and therefore is easy to test and responds quickly to new data. When the asynchronous pipeline processor is partly full, it can process several data values simultaneously, similarly to the operation of a synchronous pipeline processor.
While the system of the present invention has been discussed in terms of a multiplier, it is important to appreciate that it can be utilized to introduce pipelining into virtually any logical array. That is, as will be apparent to those skilled in the art from the following description, the principles of the present asynchronous pipeline processor is valid for implementation of any process to be performed by a series of logical steps.
In general, the system of the present invention involves processing cells which are linked in a chain to perform logical processing steps. The processing cells are interconnected to provide retrograde data paths and are controlled by an associated series of control elements. Specifically, the control elements actuate switching structures in the processing cells to move data for processing and accomplish storage in the pipeline.