1. Field of the Invention
The invention relates to digital signal processing that embeds mixed length encoding schemes within multiply-accumulate (MAC) architecture. More particularly, the invention preferably relates to mixed length 12/16 bits (12/16-b) encoding algorithms within MACs.
2. Background Information
Electronic products may be thought of as those products that involve the controlled conduction of electrons or other charge carriers, especially through microprocessors. Just about all electronic products employ microprocessors. These microprocessors employ arithmetic blocks that process signals of data such as digital data. As the demand for higher performing microprocessors increases, the demand for higher speed arithmetic blocks used in these microprocessors increases. For example, clock cycle frequencies of one gigahertz (GHz) require large, computational power for which arithmetic blocks may keep pace.
Conventional digital signal processing (DSP) generally involves processing a digital signal having thirty-two bits of data or information. A single bit of data is represented by a zero or a one. Part of processing these thirty-two bits (32-b) involves passing them through a series of multiplications and/or accumulations (which can be thought of as adders) to generate a single output vector as a final result. Mathematically, this multiplication and addition of bits may be represented as A*B+C=S1, where vector A may be a thirty-two bit multiplicand, vector B may be a thirty-two bit multiplier, and vector C may be a thirty-two bit accumulated data, where the solution may be sent to storage S1.
Latency is the time between the start of processing a signal and the completion of that signal processing. Throughput is the total capability of equipment to process data during a specified time period. High performance involves low latency and high throughput. The series of multiplications and/or accumulations have a large influence over the latency and throughput of the entire DSP application. Thus, multiplications and/or accumulations with low latency and high throughput are desirable.
A unit of the above series is known as a multiply-accumulate unit (MAC). For thirty-two bits of data, there are two methods that are available and widely used to implement 32-b MACs. The first method is a fixed length, twelve-bit (12-b) Booth encoding algorithm for multiplication. A 12-b Booth encoding algorithm is fixed when it encodes twelve bits during each clock cycle. The second method is a fixed length, sixteen-bit (16-b) Booth encoding algorithm for multiplication. A 16-b Booth encoding algorithm is fixed when it encodes sixteen bits during each clock cycle.
A high throughput MAC is a key element to achieving high digital signal processing performance. For a MAC, the latency and throughput depend on the number of multiplier bits encoded during each clock cycle. The greater the number of encoded multiplier bits processed per cycle, the higher the throughput.
Conventionally, the method of implementing the above 12-b and 16-b MACs involves two basic steps. First, create a group of partial products. Then, add these partial products together to produce the final product. In comparing a 12-b encoding scheme with a 16-b encoding scheme, the main advantage of the 12-b encoding scheme is that its Wallace Tree is about 25% faster than that of the 16-b encoding scheme. However, the 12-b encoding scheme needs two cycles to create the final sum and carry vectors for 16-b singled digital signal process (DSP) applications whereas the 16-b encoding scheme needs only one cycle.
Although conventional digital signal processing generally involves processing a digital signal having thirty-two bits of data, some digital signal processing applications process digital signals having only sixteen bits of data. Portable electronic products, for example, typically receive information in strings of sixteen bits. These 16-b DSP applications include products such as portable radios, televisions, and camera recorders. Because they are portable, low power designs for the microprocessors of portable electronic products are desirable. What is needed is a high performance, low power MAC implementation with enhanced DSP features that overcomes these two drawbacks without losing the desirable low power characteristic.
Embodiments of the present invention include a mixed length encoding unit. The mixed length may be a 12/16 bit (12/16-b) encoding algorithm within a multiply-accumulate (MAC). The mixed length encoding unit includes 16-b Booth encoder adapted to produce eight partial product vectors from sixteen bits of data. The 16-b Booth encoder is coupled to a four stage Wallace Tree. During a first cycle of the invention, a multiplex system directs the eight partial products and an accumulation vector to a four stage Wallace Tree. During subsequent cycles, the multiplex system directs six partial product vectors, an accumulation vector, one carry-feedback input vector, and one sum-feedback input vector to the four stage Wallace Tree.