Many digital signal processing tasks involve computationally intensive data manipulation. Such tasks include the processing of audio, video, electrocardiogram, radar, or other time varying signals. The intensive nature of such processing generally results from both high data throughput and algorithm complexity. Furthermore, most signal processing algorithms require data buffer resources to maintain computed values for feedback into subsequent calculations. Thus, digital signal processing burdens processing and storage resources.
Of course, the overall complexity of a signal processing task dramatically impacts resource requirements. Regardless of complexity, however, a number of common filters are applied to many different types of data. Accordingly, general processing techniques which can be applied to a filter (or even a portion of a filter) may be applied to a mathematically similar filter portion employed in another context (i.e. filtering a different type of data).
One common digital filter structure, illustrated in FIG. 1, includes a sum of parallel terms. In this system, an output signal y[n] is derived from an input signal x[n] through multiple parallel individual filters, each having at least one coefficient and delay value. In more complex systems, each parallel filter could have multiple delay taps and coefficients; however, the simplest case sufficiently illustrates shortcomings of traditional processing techniques.
In the illustrated system, a summation block 170 sums the output of the separate transfer functions x.sub.1 through x.sub.N. A first transfer function is formed by a summation block 105 which sums an input value of x[n] with a feedback value delayed by a delay (storage) block (Z.sub.-D1) 145 and scaled by a factor of al at a feedback multiplier 125. Similarly, the second, third and Nth terms are respectively formed by summation blocks 110, 115, and 120, delay blocks 150, 155, and 160, and feedback multipliers 130, 135, 140. Additional scaling blocks are often used to multiply the outputs x.sub.1 [n]-x.sub.N [n] by additional scaling factors b.sub.1 -b.sub.N before the summation block 170.
One example of a filter which employs such a parallel structure is an audio filter which produces a reverberation (reverb) effect. This filter, known as a Schroeder reverb processor, includes a parallel structure such as that shown in FIG. 1 as well as additional scaling blocks prior to the summation block 170 and series filtering functions subsequent to the summation block 170 (see Introduction to Signal Processing, Orfanidis, Sophocles J., p. 372, Prentice Hall, 1996.) In this case, the delay coefficients (D.sub.1 -D.sub.N) may be varied to model an environment of changing sound reflections.
Computations for the parallel branches of such filters are usually more efficiently performed in parallel, meaning that a number of computations are performed for each new data item before advancing to the next data item. In this example, each individual data manipulation (i.e. each parallel filter computation) is performed on some discrete quanta of data for each of the N parallel branches prior to repeating any manipulation for new data. Thus, x.sub.1 [n.sub.1 ]-x.sub.N [n.sub.1 ] are all computed before x.sub.1 [n.sub.2 ], n.sub.2 being subsequent to n.sub.1.
To compute the value of y[n], N buffers must be used to maintain respectively D.sub.1 -D.sub.N historical values of x.sub.1 [n]-x.sub.N [n]. For example, since the transfer function x.sub.1 includes a delay element Z.sub.-D1 (block 145), each value of x.sub.1 is stored for a computation which utilizes this value D.sub.1 data values later (i.e. x.sub.1 [1] is used to compute x.sub.1 [D.sub.1 ]). In this example, a pointer indicates the location x.sub.1 [1] when x.sub.1 [D.sub.1 ] is computed (x.sub.1 [D.sub.1 ]=x[D.sub.1 ]+a.sub.1 * x.sub.1 [1]). The value of x.sub.1 [D.sub.1 ] is then stored at the location indicated and the pointer is advanced to x.sub.1 [2]. Typically, buffers storing this data are implemented as circular buffers such that when the pointer reaches the end of the buffer, it is reset to allow continuous processing.
One possibility for implementing such buffers is to use N congruent (of the same length) buffers which are each long enough for the largest delay. It may be desirable to use buffers of the same length if congruent buffers are available in hardware or conveniently reserved in memory. Additionally, while many pointers may be needed to index into incongruent buffers, one pointer may suffice to track the position for a number of congruent buffers. This one pointer only requires service (needs to be reset to the beginning of the buffer) when all of the buffers have reached their endpoints. Unfortunately, the use of congruent buffers wastes buffer space for each delay which is less than a longest delay D.sub.MAX. That is, D.sub.MAX values are stored for each function x.sub.1 -x.sub.N when, for example, only D.sub.1 values are necessary to accurately compute x.sub.1. Excessive buffer size is particularly undesirable considering the already significant resource drain involved in digital signal processing.
Multiple incongruent buffers sized appropriately for each delay D.sub.1 -D.sub.N help eliminate the problem of wasted buffer space occurring when congruent buffers are used. If a buffer of exactly the right length is used for each of the delays D.sub.1 -D.sub.N, no storage more than necessary to compute x.sub.1 -x.sub.N is consumed. Unfortunately, pointers which are uniformly advanced through incongruent buffers can reach an endpoint and require service at different points in time even though the buffers are being used to process data in parallel. That is, a buffer of length six hundred may reach an endpoint (become "exhausted") and require a pointer to be reset at different time than a buffer of length nine hundred. If proper checks are not performed for each buffer, the shorter buffer may over-run its endpoint and corrupt data. Thus, tracking of each incongruent buffer is necessary to prevent over-run. Such tracking can significantly reduce performance of otherwise tight computation loops performing the desired data manipulation.
Accordingly, prior art approaches to parallel data manipulation may not provide adequate efficiency. The prior art approaches tend to either consume excessive memory or have an inner computation loop performing data manipulations in a manner incurring significant overhead. Thus, a need has arisen for an approach efficiently utilizing available memory and maintaining a simple efficient inner computation loop.