A commonly employed encoding technique in present day digital data processing systems is two's-complement notation. For positive numbers the binary representation is unaffected. For negative numbers, however, a translation or conversion is required which involves inverting all bits of the number and then adding a one to the least significant bit value to obtain the two's-complement notation.
In order to carry out arithmetic operations on such digitally encoded data, signal processing systems may manipulate the data in a parallel or serial fashion. Parallel processing requires the data to be encoded into a fixed word length, with all of the data bits being processed simultaneously. Serial processing, on the other hand, which processes data one bit at a time, does not require structuring of the data into a fixed word length. In a bit-serial data processing system, a data word (of any length) enters a bit-serial operator one bit per clock cycle, least significant bit first. The bit-serial operator (e.g. multiplier) calculates an output bit based upon the input bit and information derived and stored from previous bit serial operations (such as a carry in the case of add operation). The output data is produced serially, least significant bit first, synchronized with the system control clock through which the input data is applied to the operator. Because data is applied to the operator in a continuous, serial fashion , an external control signal is applied to the operator when a new word begins, so as to demarcate the end of one word and the beginning of the next. Typically, the bit-serial operator employs this notification signal to clear any internally stored registers.
Examples of mathematical operations that can be performed bit-serially include addition, subtraction and multiplication functions. Components which execute such functions can be connected with pipelined registers to evaluate mathematical equations. The maximum clock rate at which processing can be carried out depends upon the propagation delay of the bit serial components between pipelined registers. Also, the word throughput rate depends upon the number of bits in the word being processed.
As one example of a bit-serial operation, consider the equation R (result)=K1 (A+B)-K2 (C+D). In this equation, each of the letters A, B, C and D represents a respective multi-bit input data word, while K1 and K2 correspond to prescribed constants. Each data word A, B, C, D may be stored in a respective multi-bit shift register. The outputs of the shift registers storing data words A and B may be coupled serially to a first adder while the outputs of the shift registers storing data words C and D may be coupled serially to a second adder. The outputs of the adders are then coupled to respective multipliers to which the constants K1 and K2 are applied as additional inputs, with the outputs of the multipliers being applied to a subtraction circuit. The output of the subtraction circuit is coupled to an output shift register from which the result R is obtained.
For each clock cycle, one bit of each of the data words A, B, C and D is shifted out of its respective shift register through the arithmetic logic which executes the equation. After N clock cycles, a new word may be loaded into the respective shift registers from which data words A, B, C and D have been derived. In other words, the word rate is 1/N of the clock rate, where N corresponds to the number of bits per data word.
A commonly employed component in bit serial signal processing systems is the bit-serial adder, which adds two bit-serial data streams and produces a bit-serial data stream output corresponding to the sum of the input streams. Each sum bit is calculated as the sum of the two input bits and a carry from the addition of the previous bits. At the same time that the sum is being computed, the carry bit for the current input bits is also derived. The carry bit is latched by the input data clock for use during the addition of the next two input bits. When a new data word is to be processed, a control signal is applied to the carry latch to clear its current contents.
Using a combination of bit-serial adders, a bit-serial multiplier, which accepts a bit-serial data stream and outputs a bit-serial data stream composed of each input data word multiplied by a constant, may be implemented. The constant may be stored in a loadable register or hard-wired into the multiplier. Since an N-bit word multiplied by an N-bit constant yields a 2N-bit result, additional clock cycles are required to maintain a steady flow of data through the multiplier. Assuming that all numbers are treated as positive, then N zeroes can be inserted between input data words to equalize clock rates. If the input data words are in two's complement notation, and the constant is positive, then N-clocks of sign-extended data can be inserted between successive data words.
An example of a bit-serial multiplier formed from a tree of bit-serial adders is illustrated in FIG. 1. As shown therein, an input bit serial data stream is applied over link 11 to a shift register 12 and successively shifted therethrough by a system clock signal CLK. In the exemplary adder tree shown in FIG. 1, shift register 12 has a capacity of eight bits and the adder tree is configured to multiply an eight bit constant stored in a separate shift register 13 by a data word (of any length) that is coupled over input link 11 and shifted through the respective stages of shift register 12. The constant by which the bit serial data word is to be multiplied is stored in register 13, the successive stages of which are coupled as inputs to a set of masking AND gates 14-1 . . . 14-8. Second inputs of masking gates 14-1 . . . 14-8 are derived from the contents of the respective stages of shift register 12.
The outputs of masking AND gates 14-1 . . . 14-8 are coupled in pairs to respective bit serial adders 21, 22, 23, 24 which form a first tier of the adder tree. Each of adders 21-24 of the first tier of the tree, as well as the adders of the other tiers of the tree is comprised of a full adder with a carry latch (flip-flop). The outputs of bit serial adders 21 and 22 are coupled as inputs to a bit serial adder 31 of a second tier of the tree, while the outputs of bit serial adders 23 and 24 are coupled as inputs to a bit serial adder 32 of a second tier of the tree. Finally, the outputs of bit serial adders 31 and 32 are coupled as inputs to bit serial adder 41 at the top tier of the tree, the output of which on link 50 represents the bit-serial product.
For successive clock cycles, successive bits of an input data word on link 11 are shifted into and through the shift register 12. Because of masking gates 14-1 . . . 14-8, any bit position of the constant that is loaded into the register 13 corresponding to a zero effectively masks the corresponding bit or bits of the shift register 12, preventing that respective data from entering the adder tree. Bits which are permitted to enter the adder tree are summed together producing a product on link 50 at the top of the tree.
For applications involving the use of a known and fixed constant, rather than employ a separate constant register 13 and masking gates 14, the hardware complexity of the adder tree configuration shown in FIG. 1 can be simplified by including only those adders which would receive unmasked data from stages of the input data shift register 12 and directly connecting those shift register stages to corresponding adders. An example of 8-bit multiplier which is hard-wired to multiply an input data stream by the constant 01111100.sub.2 is shown in FIG. 2. As can be seen therein, the configuration shown in FIG. 1 has been considerably simplified, requiring only adders 22, 23, 32 and 41 of the adder tree, no masking gates and no separate constant register. Because of this ability to simplify the hardware configuration of a bit-serial adder tree, such components are particularly attractive for present day, high densified and complex integrated signal processors.
In a pipelined bit-serial signal processing system, for every N-bits input to a signal processing operator, there must be serialized out N-bits of processed data in order to avoid the necessity of rate buffering. If only the N-most significant bits of a multiplication are required, then only half of the 2N clock cycles required to shift a 2N-bit data word through the N-bit shift register of the multiplier are necessary. However, after the first N clocks have shifted the required result out of the multiplier, the pipeline and shift registers still contain information relating to the current multiplication. Even if a clear signal is supplied to the multiplier to purge that information and clear the multiplier for new data, several clock cycles are required to refill the pipeline, which constitutes an undesirable and, in highly pipelined environments such as referenced above, sometimes unacceptable signal processing delay.