There has been increasing interest over the last few years in the bit-serial approach to digital integrated circuit (IC) design. The major advantages which this approach offers are: the small bit-width required of signal ports to and from the integrated circuit, and the reduced number of computational elements, as compared with parallel computation. The bit-serial approach was advocated by L.B. Jackson, J. F. Kaiser and H. S. McDonald in their article entitled "An Approach to the Implementation of Digital Filters:, IEEE Transactions on Audio and Electroacoustics, Vol. AU-16, No. 3, September 1968, pp. 413-421, as offering savings in routing and computational hardware as compared with parallel architecture.
In a bit-serial circuit, data flows from one computational element to another along serial lines. The steady stream of bits is divided into words of a fixed number of bits in length. Arithmetic data values are represented in two's complement format and are passed least significant bit first. Since data flows least significant bit first, the sign bit is the last bit of the word. Separate words of data follow each other directly with no idle bits separating them. Each computational element receives a synchronized control signal (if needed) to indicate to it where one word ends and the next starts. This signal may be synchronized with the most significant bit (sign bit) of each word.
Each operator in a bit-serial circuit has a fixed latency, which is the number of cycles that elapse between the time that the first bits of input signal arrive and the first bit of the output signal response is available. Since each operator may have a different latency, it is usually necessary to insert clocked delays (implemented as shift-registers) into the circuit in order to synchronize the different inputs to an operator. The minimization of the number of delays that need to be inserted is of importance in a bit-serial integrated circuit design. It is desirable to reduce unnecessary digital hardware to make room on the integrated circuit die for more important circuitry or, alternatively, to permit reduced die size so more dies can be cut from each silicon wafer. Eliminating unnecessary circuitry conserves system power requirements and tends to improve system reliability. In the particular case of delay circuits it is usually desired to reduce their number or relocate them so as to reduce the latency involved in arithmetic or logic processes and thereby improve system speed of response.
It has been demonstrated that bit-serial circuitry is particularly suited to automatic chip generation using silicon compilers. See, for example, P. Denyer and D. Renshaw, VLSI Signal Processing, A Bit-Serial Approach, Addison-Wesley (1985); J. R. Jasica, S. Noujaim, R.I. Hartley, and M. J. Hartman, "A Bit-Serial Silicon Compiler", Proceedings of the IEEE International Conference on Computer Aided Design, p 91-93 (1985); F. F. Yassa, J. R. Jasica, R. I. Hartley, and S. E. Noujaim, "A Silicon Compiler for Digital Signal Processing: Methodology, Implementation and Applications", Proceedings of the IEEE, Special Issue on Hardware and Software for Digital Signal Processing, Vol. 75, No. 9, Sept. 1987, pp. 1272-1282; and R. Jain, F. Catthoor, J. Vanhoff, B. J. S. DeLoore, G. Goossens, N. F. Goncalvez, L. J. M. Claesen, J. K. J. Van Ginderdeuren, J. Vandewalle, and H. J. De Man, "Custom Design in a VLSI PCM-FDM Transmultiplexer from System Specifications to Circuit Layout Using a Computer-Aided Design System", IEEE Journal of Solid State Circuits, Vol. SC-21, No.1, February 1986, pp. 73-85.
A silicon compiler is generally described as a combination of software and hardware which accepts high-level language instructions from a human and produces chip production masks which are used in the fabrication of electronic circuitry designed to carry out the high level function specified by the human. The silicon compiler stores in its memory information concerning how to design masks for certain standard circuit configurations. For circuit configurations which implement a large number of similar electronic processing steps, which number of steps may vary from design to design, it is most efficient of memory to store information concerning masks for portions of the circuitry denominated "cells", at least some of which cells can be iterated as many times as needed to generate a range of different integrated circuit designs. Accordingly, designs for integrated circuits are sought which are flexible in regard to accepting digital signals of different word size, etc., but can be constructed from basic cells or partial circuits so the masks for making the circuits can be readily generated using a silicon compiler.
There has been particular interest in the bit-serial procedure for performing digital arithmetic tasks. Addition procedures (which may be signed) are readily implemented by bit-serial processing, with reduced hardware requirements as compared to parallel processing. Multiplication procedures for bit-serial operands have been and continue to be a subject of study, because digital multiplication by parallel processing requires a large number of hardware elements. Signed multiplication procedures are especially challenging.
Fully pipelined bit-serial multipliers where multiplication proceeds as both operands are serially received are described by I-N. Chen and R. Willoner in "An O(n) Parallel Multiplier with Bit-Sequential Input and Output". IEEE Transactions on Computers, Vol. C-28, No. 10, October 1979, pp. 721-727, and by N. R. Strader and V. T. Rhyne in "A Canonical Bit-Sequential Multiplier", IEEE Transactions on Computers Vol. C-31, No. 8, August 1982, pp. 791-795. These multipliers have two significant drawbacks. Firstly, they are not easily extended to two's complement calculation, operating only on unsigned integers. Secondly, they can accept new input data only once every 2n cycles.
The design of the Chen et alii and Strader et alii multipliers was modified as described by J. T. Scanlon and W. K. Fuchs in "High Performance Bit-Serial Multiplication", Proceedings of the IEEE International Conference on Computer Design, pp. 114-117 (1986). Scanlon et alii observed that the individual cells in the Chen et alii and Strader et alii arrays are underused, being used on the average only half of the time. An ingenious but somewhat cumbersome bidirectional array of multiplier slices was used by Scanlon et alii, that allows new input data every n+1 cycles. The design is easily further modified to allow new samples every n cycles by the addition of one extra bit slice. A significant drawback of this multiplier, however, is that it does not handle two's complement numbers easily. Furthermore, the external control circuitry required is complex, since control signals and input data must be fed to alternate ends of the multiplier array. Furthermore, the output data of consecutive calculations come from alternate ends of the multiplier array.
R. I. Hartley and P. F. Corbett describe a fully pipelined serial-bit multiplier in their U.S. Pat. No. 4,860,240 issued Aug. 22, 1989, entitled "LOW LATENCY TWO'S COMPLEMENT BIT-SERIAL MULTIPLIER" and assigned to General Electric Company. The Hartley et alii serial-bit multiplier supplies the major product (i.e., the higher order bits of the full product) in a separate bit stream from the minor product (i.e., the lower-order bits of the full product). The major product is supplied immediately following the minor product, which is advantageous in that one can select a product of desired precision on a floating point basis, selecting bits from either or both the major product and minor product bit streams using a time-division multiplexer. Dual-bit carries are used in the partial summation procedures used to generate the major product.
A number of the bit-serial multiplication procedures used prior to the development of fully pipelined bit-serial multipliers are of a type in which each successive word of one of the bit-serial operands is converted to parallel form prior to actual multiplication proceeding. This type of bit-serial multiplier was described by R. F. Lyon, in a concise paper "Two's Complement Pipeline Multipliers", IEEE Transactions On Communications, Vol. COM-12, No.4 April 1976, pp. 418-425. The Lyon multiplier will accept new n-bit operand values only every n+1 cycles, which undesirably requires that one idle bit be inserted between each pair of successive operand words.
In the Lyon multiplier and its descendants the bit-serial multiplicand is supplied to a serial-to-parallel converter during the time interval a preceding multiplication is carried out, and the parallel bits of the multiplicand are then latched into a multiplicand, or "icand", register throughout the ensuing time interval that multiplication actually proceeds. The successive bits of the bit-serial multiplier are then multiplied by each bit of the multiplicand in respective successive clock cycles. The partial sum is continuously being revised by serial addition while multiplication progresses.
To accommodate this, the low-order bits of the product (i.e., the minor product) are discarded as they occur in the Lyon multiplier, and only those portions of the partial sums needed for generating the n high-order bits (i.e., the major product) are kept. This procedure does not permit multiplication with a fractional multiplier signal, and it does not permit double-precision multiplication. The Lyon multiplier has the additional drawback that sign bit extension for the multiplicand involves quite complex circuitry.
J. T. Scanlon and W. K. Fuchs describe in their 1986 paper "High Performance Bit-Serial Multiplication", Proceedings of the IEEE International Conference on Computer Design pp. 114-117, a modification of the Lyon bit serial multiplier in which the bits of the major product flow through one pipeline, while the bits of minor product are preserved and delivered into another pipeline. This multiplier like Lyons's accepts new operand values only every n+1 cycles, which makes it difficult to apply bit-serial operands directly to the multiplier in a pipelined operation.
Lyon contrasts his multiplication procedure with the prior-art serial-parallel multiplier as modified to a pipeline form. In the serial-parallel multiplier modified to pipeline form, as Lyon describes in regard to FIG. 1 of his concise paper, each successive bit of the serial multiplier signal simultaneously multiplies all bits of the multiplicand as held in parallel in the icand register to form a partial product, which is subsequently parallelly added with appropriate bit shift to the preceding partial sum to generate a new partial sum. Lyon dismisses the serial-parallel multiplier, as not being an attractive method for multiplying an k-bit bit-serial multiplicand and by an n-bit bit-serial multiplicand, because the full product (i.e. both major and minor products) must be generated over n+k or n+k-1 clock cycles. Lyon viewed this as restricting successive operations to begin every (n+k+1).sup.th or (n+k).sup.th clock cycle and making the serial-parallel multiplier unattractive for performing successive multiplications on a pipelined basis.
The inventors disagree with the view that the serialparallel multiplier is unsatisfactory for performing successive multiplications on a pipelined basis. Successive multiplications of bit-serial numbers each or bit long can be initiated every n.sup.th bit interval in pipelined multiplication through the serial-parallel multiplier the inventors find, providing the full products are removed in two bit streams. One bit stream supplies successive minor product terms and the other bit stream supplies successive major product terms, with n bit intervals more latency. Groups of n successive bits can be selected from these bit streams to supply product that has n-bit precision and that has binary point location where desired.