This invention is in the field of digital signal processing, and is more specifically directed to infinite impulse response (IIR) digital filters as used in such processing.
Digital signal processing has become a mainstay technology in modem electronic systems and devices that involve audio input and output. The well-known medium of compact discs (CD) is a prime example of the digital nature of modern audio replication and playback. More recently, audio content is also now distributed and used from other digital formats, including digital video disk (DVD) and purely electronic forms, such as audio files encoded according to the MP3 standard. In addition to these formats, audio processing for real-time transmissions, such as broadcast audio transmissions, audio communications over the Internet, and even audio telephony, is now largely carried out in the digital domain.
The digital filter is an important building block in the digital signal processing of audio information. Of course, the audio processing of digitally stored content may be carried out by converting the digital information to analog, and then applying analog signal processing techniques such as filters and the like to the converted analog signal. However, as is well known in the art, digital filters can provide high precision processing of audio signals at very low cost, especially for audio applications in which the audio content emanates from a digital source to begin with. The capabilities of digital filters to precisely process audio signals has especially increased with the high performance digital signal processors (DSPs) that are now available. These advances have also resulted in custom and semi-custom logic circuits that have built-in digital filter blocks, and also in the design and production of digital audio processors (DAPs), such as the TAS3103 digital audio processor available from Texas Instruments Incorporated.
The infinite impulse response (IIR) digital filter is an important type of digital filter for audio processing. The second order IIR digital filter, commonly referred to as a “biquad”, is a popular IIR building block, and can be cascaded to provide very high order digital filter functions at low cost and high efficiency. For example, conventional digital audio processing devices, such as the TAS3103 mentioned above, include on the order of twelve biquad IIR filters per audio channel to provide graphic equalization, speaker parameter equalization, phase compensation, and the like; additional biquads are used in treble and bass control, and other audio functions.
By way of background, FIG. 1 schematically illustrates the direct form of a conventional biquad, second order IIR digital filter 10. Input datastream x{n} is a sequence of discrete input values, which are processed by filter 10 to produce output datastream y{n}, also as a sequence of discrete values. The filter equation implemented by filter 10 of FIG. 1 can be expressed as:y(n)=b0·x(n)+b1·x(n−1)+b2·x(n−2)+a1·y(n−1)+a2·y(n−2)where the sample indices n−1, n−2 refer to previous values of the input and output datastreams. Referring to FIG. 1, the feed-forward side of digital filter 10 is implemented by multiplier 20 for multiplying current input value x(n) by coefficient b0, multiplier 21 for multiplying the next previous input value x(n−1) from delay stage 30 by coefficient b1, and multiplier 22 for multiplying twice-delayed input value x(n−2) from delay stage 31 by coefficient b2. On the feedback side, multiplier 40 multiplies the previous (once-delayed) output value y(n−1) from delay stage 50 by coefficient a1, and multiplier 41 multiplies twice-delayed previous output value y(n−2) from delay stage 51 by coefficient a2. The outputs of multipliers 2 and 4 are all applied to inputs of adder (or accumulator) 6, and the resulting sum from adder 6 constitutes the current output sample value y(n), after clipping by limiter 7. This direct-form representation is typical for second-order IIR digital filters, as is fundamental in the art.
From this representation, one can readily derive the number of digital operations necessary for implementing a biquad digital filter. The necessary operations for conventional realizations (using registers for temporary storage):
OperationsNumber of instancesClear accumulator1Data load5Coefficient load5Multiplications5Accumulate5Store4These twenty-five operations can readily be seen from the direct form illustration of FIG. 1. Each of multipliers 2, 4 require register loads of data values and coefficients; each delay stage 3, 5 involves a store operation, and adder 6 requires clearing of the previous result and accumulating of the current result.
Modern logic architectures have achieved some efficiencies in the execution of a biquad digital filter by identifying those operations that can be performed in parallel with one another. FIG. 2 schematically illustrates a conventional biquad architecture, implemented by way of a single multiply-and-accumulate stage.
In this conventional architecture, coefficient random access memory (RAM) 10 stores the IIR coefficients. As known in the art, and as will be discussed in further detail below, the same multiply-and-accumulate stage architecture as shown in FIG. 2 may be used in a cascade manner, in which case coefficient RAM 10 may store multiple sets of IIR coefficients, corresponding to each of the multiple cascaded IIR filters. Coefficient register 12 is coupled to receive a selected coefficient value from coefficient RAM 10, and to apply this coefficient to multiplier 15 for one of the multiplications in the IIR filter. Similarly, data RAM 14 stores the input datastream x{n} values and the output datastream values y{n}, and data register 16 stores a selected one of these data values for application to multiplier 15. The output of multiplier 15 is stored in product register 18, and then applied to accumulator 19, which has an output coupled back to data RAM 14 and data register 16. Address and control circuitry 13 is logic circuitry for controlling the addressing and accessing of coefficient RAM 12 and data RAM 16 in the performing of an IIR sequence, and also refers to control circuitry for clocking the various registers, including coefficient register 12, data register 16, and product register 18, and for controlling the other functions in this implementation such as clearing accumulator 19.
In operation, this conventional architecture implements a three-stage pipeline with up to four parallel operations, to perform a biquad, second order IIR filter, in eight instruction cycles, or clock cycles. These instructions can be summarized as:
Operations atOperations at datacoefficientOperations atOperations atOperations atCycleregister 16register 12product register 18accumulator 19data RAM 141Load x(n)0Load b02Load x(n-1)0Load b1Load b0 · x(n)0Clear ACCStore x(n)0 asx(n-1)13Load x(n-2)0Load b2Load b1 · x(n-1)0Add b0 · x(n)0 toStore x(n-1)0 asACCx(n-2)14Load y(n-1)0Load a0Load b2 · x(n-2)0Add b1 · x(n-1)0 toACC5Load y(n-2)0Load a1Load a0 · y(n-1)0Add b2 · x(n-2)0 toStore y(n-1)0 asACCy(n-2)16Load a1 · y(n-2)0Add a0 · y(n-1)0 toACC7Add a1 · y(n-2)0 toACC8Store y(n)0 inACC as y(n-1)1In this summary of the IIR filter execution, the operations at each of registers 12, 16, 18, at accumulator (ACC) 19, and at data RAM 14, are indicated for each clock cycle, relative to a 0th instance of the IIR filter execution. In the first clock cycle, data register 16 is loaded from data RAM 14 with input data value x(n)0, which is the most recent input sample value, and coefficient register 12 is loaded with coefficient b0 from coefficient RAM 10; these values then appear at the output of registers 16, 12, respectively, and are multiplied by multiplier 15. In clock cycle 2, registers 16, 12 are loaded with data value x(n−1)0 and coefficient b1, respectively; meanwhile, the product b0·x(n)0 generated by multiplier 15 during clock cycle 1 is stored in product register 18, and accumulator 19 is cleared. Also in this cycle 2, the previous contents x(n)0 of data register 16 are stored in data RAM 14 as sample value x(n−1)1 for the next iteration of the IIR filter. This storing operation may be a replacement of the previous contents x(n−1)0 of this location of data RAM 14, or may be accomplished by incrementing an address register accordingly. In any event, the current sample value x(n)0 for iteration 0 becomes the previous sample value x(n−1)1 for iteration 1 of the IIR filter.
In cycle 3, registers 12, 16 are loaded with coefficient b2 and data value x(n−2)0, respectively, product register 18 is loaded with the cycle 2 product of b1·x(n−1)0, and the previous contents x(n−1)0 of data register 16 are stored in data RAM 14 as sample value x(n−2)1 for the next iteration. In addition, the previous contents of product register 18 (namely the product of data value x(n)0 and coefficient b0) are accumulated into accumulator 19. In cycle 4, registers 12, 16 are loaded with coefficient a0 and previous output data value y(n−1)0, respectively, product register 18 is loaded with the cycle 3 product of data value b2·x(n−2)0, and the previous contents of product register 18 is accumulated into accumulator 19. In clock cycle 5, registers 12, 16 are loaded with coefficient a1 and previous output data value y(n−2)0, respectively, product register 18 is loaded with the cycle 4 product of a0·y(n−1)0, the previous contents of product register 18 are accumulated into accumulator 19, and the previous output data value y(n−1)0 is stored in data RAM 14 as output data value y(n−2)1 for the next IIR iteration. Clock cycles 6 and 7 effectively empty the pipelines, by forwarding the results of the multiplication by multiplier 15 into product register 18 and accumulator 19, resulting in the final output value y(n)0 for this iteration 0 being present in accumulator 19 after clock cycle 7. In clock cycle 8, the result y(n)0 in accumulator 19 is stored in data RAM 14 as previous output value y(n−1)1, preparing for the next iteration.
This operation of the conventional biquad architecture effects a second-order IIR digital filter, as mentioned above. Higher-order filters can be implemented by cascading biquads in sequence. FIG. 3 illustrates this conventional cascading, in the example of a fourth-order IIR digital filter, implemented by biquads 20, 22 arranged in sequence. In this arrangement, input sample datastream x{n} is applied to the input of biquad 20, which produces output sample datastream y{n} at its output after the application of a second-order IIR filter using coefficients a0, a1, b0, b1, b2, in the manner described above relative to FIG. 2. Datastream y{n} is effectively an intermediate result, and is applied to the input of biquad 22. Biquad 22 applies another second-order IIR digital filter to datastream y{n}, producing ultimate output datastream z{n} at its output, using coefficients a0′, a1′, b0′, b1′, b2′ (typically differing from those used in first stage biquad 20). In practice, the number of sequential biquads 20, 22 is arbitrary, and in fact can be quite large. For example, it is contemplated that as many as twelve biquad IIR digital filters are typically cascaded for the processing of a single audio channel in a conventional digital sound system, and it is contemplated that, in the near future, digital audio systems may incorporate a sequence of as many as seventy-two biquads into each audio channel. As known in the art, and as mentioned above, the cascaded biquads are typically implemented by a single multiply-and-accumulate stage architecture, as shown in FIG. 2, with multiple sets of coefficients stored in coefficient RAM 10, and sequentially applied to execute the cascaded biquad filters.
Simplistically, the number of cycles necessary to effect the cascaded biquads, using the conventional architecture of FIG. 2, may be calculated as simply the number of cascade biquad stages times eight clock cycles. However, it is known that some efficiencies can be accomplished with the conventional architecture of FIG. 2, such that subsequent biquad stages after the first biquad can be executed in seven clock cycles, which is of course one fewer cycle than the full eight clock cycles required for the first biquad. This is accomplished by using the eighth clock cycle in the previous biquad stage to begin processing for the next biquad. An example of this conventional execution can be summarized, for a second biquad 22, beginning with clock cycle 8 of the first biquad 20, as follows:
Operations atOperations at datacoefficientOperations atOperations atOperations atCycleregister 16register 12product register 18accumulator 19data RAM 148Load ACC as y(n)1Load b0′9Load y(n-1)1Load b1′Load b0′ · y(n)1Clear ACCStore y(n)1 asy(n-1)210Load y(n-2)1Load b2′Load b1′ · y(n-1)1Add b0′ · y(n)1 toStore y(n-1)1 asACCy(n-2)211Load z(n-1)1Load a0′Load b2′ · y(n-2)1Add b1′ · y(n-1)1 toACC12Load z(n-2)1Load a1′Load a0′ · z(n-1)1Add b2′ · y(n-2)1 toStore z(n-1)1 asACCz(n-2)213Load a1′ · z(n-2)1Add a0′ · z(n-1)1 toACC14Add a1′ · z(n-2)1 toACC15Store ACC asz(n-1)2To save the clock cycle in second stage biquad 22, the operation of first stage biquad 20 is changed by delaying the storing of the contents of accumulator 19 for one clock cycle, until clock cycle 9 (rather than clock cycle 8), because this value y(n)1 is needed as an input to biquad 22. In clock cycle 8 in this cascaded approach, the contents of accumulator 19 are loaded into data register 16 as input value y(n)1 for biquad 22, in clock cycle 8. Also in clock cycle 8, coefficient register 12 is loaded with coefficient b0′ from coefficient RAM 10. In clock cycle 9, registers 12, 16 are loaded with coefficient b1′ and data value y(n−1)1, respectively; meanwhile, product register 18 loads the product b0′·y(n)1 generated by multiplier 15 during clock cycle 8, accumulator 19 is cleared, and the output value y(n)1 is stored in data RAM 14 as output value y(n−1)2 for the next iteration. In cycles 10 through 15, biquad 22 operates in the same manner as biquad 20, described above, operating upon biquad 20 output values y(n)1, y(n−1)1, y(n−2)1 applied as input values, deriving a new output value z(n)2 as a result.
As mentioned above, additional biquad stages may be appended to the output of biquad 22 of FIG. 3, producing still higher order filter results. If such is the case, the storing of the contents of accumulator 19 in data RAM 14 in cycle 15 is delayed one cycle, as it was in the case of storing the accumulator contents 19 at the end of biquad 20, and the next biquad stage is then executed in similar manner as biquad 22 described above. As a result, the overall number of cycles required for an IIR digital filter involving k biquad stages is 8+7k clock cycles.
The number of clock cycles required for execution of a biquad, second-order, IIR digital filter can become a critical parameter in the implementation of a digital signal processing function. In the audio processing context, the degree or extent to which digital filtering can be performed on an audio channel is limited by the amount of latency that can be tolerated in the system, and by the available clock rate. Conversely, if the desired level of filtering can be accomplished with fewer clock cycles, either the clock rate of the digital filters can be reduced, reducing the cost of the audio processor, or alternatively additional functionality may be implemented within the audio signal flow. In either case, a reduction in the number of clock cycles that are required to carry out digital filters directly translates into lower cost, or improved functionality, in an audio processing system.