1. Field of the Invention
This invention relates to implementation of wavelet analysis in hardware.
Wavelet analysis provides a powerful method for analysing time-varying signals. Conceptually, wavelet analysis can be considered as being related to Fourier analysis. As is well-known, Fourier analysis can transform a signal varying in amplitude in the time domain into a signal that varies in the frequency domain. Fourier analysis thereby provides an indication of the frequency content of the signal. Commonly, Fourier analysis uses sine and cosine as basis functions, whereby the transform is indicative of the sine and cosine content of the original signal across a frequency range.
An important limitation of a Fourier transform is that it is applied across the entire time extent of the original signal: all time information is lost in the transformed signal. This means that any variation in the character of the original signal with time cannot be deduced from the transform. Moreover, a Fourier transform cannot be used to analyse discrete time segments in a continuous signal. For example, if the signal is a continuous speech signal, a Fourier transform cannot be used to perform a frequency analysis on a time-limited segment such as a single word within the speech signal.
Wavelet analysis has been evolved as a more powerful analysis tool. Two features of wavelet functions contribute in particular to their power.
First, wavelet analysis can be performed over a part of the original signal that is limited in time. Moreover, the time over which the analysis operates can be varied simply by making relatively small changes to the analysis procedure. This allows the analysis to be tuned to give results that are more accurate in either their resolution in frequency or in time, as best suits the objective of the analysis (although, it should be noted, that an increase in accuracy in one domain will inevitably result in a decrease in accuracy in the other).
Second, wavelet analysis can be based on an arbitrary basis function (referred to as a xe2x80x9cmother waveletxe2x80x9d. It might be used to express the frequency content of a time-varying signal in terms of its frequency-domain content of, for example, sine and cosine waves, square waves, triangular waves, or any other arbitrary wave shape. The basis functions can be chosen to give the most useful result based upon the content and nature of the original function and upon the result that is being sought in performing the analysis.
More formally, it can be said that a continuous wavelet transform (CWT) analyses a signal x(t) in terms of shifts and translates of the mother wavelet. This is represented as follows:                               CTWT          ⁡                      (                          b              ,              a                        )                          =                              1                          a                                ⁢                      ∫                          h              *                              (                                                      t                    -                    b                                    a                                )                            ⁢                              x                ⁡                                  (                  t                  )                                            ⁢                              ⅆ                t                                                                        (        1        )            
The wavelet transform performs a decomposition of the signal x(t) into a weighted set of basis functions h(t), which are typically time-limited, finite energy signals that oscillate like waves (hence the term xe2x80x9cwaveletsxe2x80x9d).
As will be appreciated, this transform is complex, and consumes a considerable amount of computer power if it is to be performed by a computer executing a software program.
This may be acceptable if it is to be used where real time processing is of lesser importance. For example, wavelet transforms are used in software compression or decompression of files representing still images. However, where speed in performing the analysis is critical, this can cause the method to become, for practical purposes, unworkable. For example, if the analysis is to be applied to compression or decompression of moving images in real time, the cost of providing sufficiently powerful computers may be prohibitive. It has therefore been recognised that there may be significant advantage in implementing the transform directly in hardware, for example, for incorporation into an ASIC or FPGA design, or as a core for inclusion in a signal processing hardware system.
As a first step to this end, it has been shown that a discrete representation of the wavelet function allows the transform to be calculated by a small number of relatively simple components, namely, a high-pass filter and a low-pass filter, each filter being followed by a downsampler (otherwise known as a decimator) with a factor of 2. As will be recognised, each of these components can be implemented in hardware in a reasonably straightforward manner. Mathematically, a discrete wavelet transform (DWT) of the discrete function x(n) is represented in Equation 2, below:                               DWT                      x            ⁡                          (              n              )                                      =                  {                                                                                          c                                          j                      ,                      k                                                        =                                      ∑                                                                  x                        ⁡                                                  [                          n                          ]                                                                    ⁢                                                                        h                          j                          *                                                ⁡                                                  [                                                      n                            -                                                                                          2                                j                                                            ⁢                              k                                                                                ]                                                                                                                                                                                                              s                                          j                      ,                      k                                                        =                                      ∑                                                                  x                        ⁡                                                  [                          n                          ]                                                                    ⁢                                                                        g                          j                          *                                                ⁡                                                  [                                                      n                            -                                                                                          2                                j                                                            ⁢                              k                                                                                ]                                                                                                                                                                            (        2        )            
The coefficients cj,k describe the detailed components in the signal and the coefficients sj,k refer to the approximation components in the signal. The transfer functions h(n) and g(n) in this equation represent the coefficients of the high-pass and the low-pass filters and are derived from the wavelet function and the inverse (scaling) function respectively. The low-pass filtered and downsampled output of each stage is fed forward to the following stage, which gives a successively reduced time resolution and increased frequency resolution after each stage. Several stages can therefore be cascaded to provide transform outputs at several levels of resolution.
Wavelet packet decomposition is an important development of wavelet analysis. The principle behind these decompositions is to selectively choose the basis function (mother wavelet) and the frequency bands to be decomposed. This basic structure of hardware for performing wavelet packet decomposition is sown in FIG. 12. Improved results in the area of speech and image coding, as well as signal detection and identification, have been reported using this kind of wavelet analysis. There are two issues related to the implementation of wavelet packet transforms. The first is the choice of a different wavelet function for each filter bank and the second relates to the arbitrary connection between the outputs of a filter bank stage to the inputs of the succeeding stages. As will be seen, the characteristic arrangement of a filter followed by a downsampler is clearly present in this circuit as illustrated at 1210.
2. Summary of the Prior Art
A hardware implementation of a DWT is most typically based upon the filter bank arrangement defined in Equation 2. A typical hardware implementation of such a three-level DWT is shown schematically in FIG. 1. 110xe2x80x2, 110xe2x80x3 and 110xe2x80x2xe2x80x3 represent, respectively, a low-pass filter of the first, second and third stages, 112xe2x80x2, 112xe2x80x3 and 112xe2x80x2xe2x80x3 represent, respectively, a high-pass filter of the first, second and third stages, and 114xe2x80x2, 114xe2x80x3 and 114xe2x80x2xe2x80x3 represent, respectively, downsamplers of the first, second and third stages. Each of the filters 110, 112 of the circuit shown in FIG. 1 has a general form shown in FIG. 2, comprising, in each of a plurality of stages, a multiplier 210, a delay line 212 and an adder 214. The characteristics of such a filter (which has a structure very familiar to those skilled in the technical field) is determined by a set of coefficients C1 . . . Cn applied respectively to each of the multipliers. Variation of the values of these coefficients, therefore allows the designer to control the characteristics of the wavelet transform operation.
This circuit is typically quite demanding in terms of circuit area requirement. The decreasing sampling rate (a result of the downsampling operations) and increasing word length (a result of the filtering operations) as the stages progress, add to the complexity of the circuit design. In particular, each multiplier 210 represents a considerable demand upon resources.
It has previously been proposed that the design of such hardware might be optimised through production of custom designs which typically use various data organisation formats such as bit-serial and digit-serial designs. However, in most cases, these designs are limited in scope and applicability; typically a new design must be produced for each application. Moreover, these previous proposals have implemented only a limited range of basis functions with limited levels of decomposition. This is particularly disadvantageous because it denies the flexibility and the wide range of transform possibilities that the mathematical analysis suggests should be available using wavelet techniques. Moreover, many previous proposals have proven to be difficult to use in practice, and offer versatility at the sacrifice of efficient use of silicon area and of power.
An aim of this invention is to provide a design method for implementing hardware capable of performing DWT operations. In particular, its aim is to provide such a method that is versatile and easy to use, and that produces a hardware design that is efficient in its use of components and in the area of silicon that it occupies.
Furthermore, it is desired that the invention provide an implementation for wavelet transforms in hardware that possesses advantageous architectural arrangements as well as flexibility in wavelet choice, levels of decomposition and the word lengths.
In arriving at this invention, the inventors have realised that the downsampling that occurs at each stage means that, in conventional designs, the circuits typically operate at less that maximum capacity. At the first stage, each alternate output from the filters is discarded. Moreover, the input bandwidth at subsequent stage is successively halved, giving rise to additional excess capacity in typical systems in which each stage operates at an identical clock frequency.
More specifically, the filter output ylp(n) from a conventional low-pass analysis filter can be written as:                                           y            lp                    ⁡                      (            n            )                          =                              ∑                          k              =              0                        7                    ⁢                                                    h                lp                            ⁡                              (                k                )                                      ⁢                          x              ⁡                              (                                  n                  -                  k                                )                                                                        (        3        )            
Since the output of each wavelet filter is downsampled by a factor of two, the alternate odd and even index values produced from the filter are discarded.
The bi-phase decomposition of wavelet filters can be obtained by observing the output sequence. This allows the impulse response hlp(n) to be written in the form of even and odd order coefficients. The polyphase decomposition of the above filter H(z) can be mathematically described as:                               H          ⁡                      (            z            )                          =                                            ∑                              n                =                0                                            n                =                7                                      ⁢                                          h                ⁡                                  (                                      2                    ⁢                    n                                    )                                            ⁢                              z                                                      -                    2                                    ⁢                  n                                                              +                                    z                              -                1                                      ⁢                                          ∑                                  n                  =                  0                                                  n                  =                  7                                            ⁢                                                h                  ⁡                                      (                                                                  2                        ⁢                        n                                            +                      1                                        )                                                  ⁢                                  z                                                            -                      2                                        ⁢                    n                                                                                                          (        4        )            
From a first aspect, the invention provides an architecture component for use in performing a wavelet transform of a sampled signal, the component including a multiplier, and a multiplexor to multiplex a number n of filter coefficients onto the multiplier, in which the multiplier processes n consecutive samples with consecutive coefficients, successive multiplier outputs being stored for subsequent processing to generate an output of the filter after every n samples.
It will be recognised that such a component can serve the same purpose as a filter and a downsampler (for example, as shown at 120 in FIG. 1), this being achieved with a significantly smaller number of components than is possible with a conventional configuration.
Architectures embodying the invention process data at the same rate as a full-length direct-form filter but each multiplier uses a different coefficient multiplier value xe2x80x98anxe2x80x99 (the coefficients) and multiplicand value xe2x80x98x(n)xe2x80x99 (even or odd sample) in each processing cycle. Each filter generates output only when both the even and odd index samples have been processed.
Wavelet transforms to which the invention may be applied include discrete wavelet transforms and wavelet packet decomposition.
In typical embodiments, the result from the odd index samples is temporarily stored in a memory and it is added to the result from even index samples to generate a complete filtered and decimated output.
Thus by using time-interleaved coefficients for the multipliers and an accumulator in the output, up to a 50% reduction in the number of multipliers over a direct-form FIR filter structure is achieved.
In a conventional system used to carry out a wavelet transform, half of the data processed by the filter is abandoned in the downsampling stage. This present invention takes advantage of the filter bandwidth that is effectively wasted in conventional systems.
In a first group of embodiments, an architecture component embodying the invention comprises a plurality of multipliers and associated multiplexors. In such embodiments, the subsequent processing may include generating an output by summing sequential multiplier n samples.
To produce a configuration with a folding number of 1, the value of n may be equal to 2.
An architecture component embodying the invention may have a number m of processing stages, each stage including a multiplier. In such embodiments, sample data may be conveyed from each processing stage to a subsequent processing stage with a delay of n times a sampling period of the signal. For example, such embodiments may include a chain of n buffers through which data is conveyed from each stage to a subsequent stage.
Embodiments of the invention may further include a buffering stage in which an output value from each processing stage can be stored. A summing stage may be provided that is operative to sum the values stored in the buffering stage to produce an output for every n samples received.
In order to achieve a greater degree of folding, an architecture component embodying the invention may further comprise a multiplexor operative to present each sample to a multiplier a plurality of times for multiplication by a plurality of coefficients.
Embodiments of the invention may be configured to operate as a filter of order m.n followed by a downsampler of order n. (For example, it may act as a Daubechies 8-tap filter followed by a downsampling of 2 in which case m=4 and n=2.)
In embodiments according to the last-preceding paragraph, the filter operation performed by the architecture component may be a low-pass filter or a high-pass filter operation.
The (or each) multiplier in embodiments of the invention may be constituted by separate multiplier units or may be embodied within a MAC unit.
Architecture components embodying the first aspect of the invention are most typically incorporated into an architecture for performing a discrete wavelet transform of a sampled signal. The invention provides such an architecture from a second of its aspects. For example, such an architecture may comprise an architecture component configured to operate as a low-pass filter followed by a downsampling connected in parallel with an architecture component configured to operate as a high-pass filter followed by a downsampling. Such an architecture may also comprise a plurality of series-connected architecture components of the invention""s first aspect.
From a third aspect, the invention provides a core for incorporation into an integrated circuit for implementing a wavelet transform, the core having an architecture according to any preceding claim. Such a core can be used as a xe2x80x9cplug inxe2x80x9d component in the construction of a wide range of complete electronic systems.
From a fourth aspect, the invention provides an architecture component for use in performing a wavelet transform, the architecture implementing the function of an n-tap filter followed by a downsampler, the architecture including a number of multipliers less than n, and a multiplexor for applying to the multiplier a plurality of multiplier coefficients, whereby successive data samples are processed by the multiplier using alternative filter coefficients multiplexed onto the multiplier.
For a folding factor of 1, embodiments of this aspect of the invention may have n/2 multipliers.
Most typically, each multiplier acts upon sequential data samples with alternative filter coefficients. Where n=2, two filter coefficients for each multiplier may be applied to the multiplier alternately.
In processing the signal, such an architecture component may be configured to multiply each sample by a total of n/2 coefficients. These multiplications may be carried out by a single multiplier (spaced in time, typically by an integer multiple of the sample period), or may be carried out by a plurality of multipliers, or both.
The time-interleaved approach is not only attractive in terms of hardware implementation of wavelet transforms, but also readily lends itself to parameterisation and is thus suitable for rapid design and synthesis. Attention here is focused mainly on high throughput applications and consequently a bit-parallel, word-serial filter implementation has been assumed. However the basic architecture can be simply extended to create silicon generators in which other word formats, such as bit-serial or digit-serial data streams are used. This allows flexibility across applications in trading silicon area with performance specifications.
Therefore, from a fifth aspect, the invention provides a method for defining an architecture for performing a discrete wavelet transform of a sampled signal in which the wavelet transform is defined in terms of a plurality of numeric parameters. These parameters may specify one of more of the following: the choice of wavelet function; the input word length; the output word length; the rounding or truncation required; additional bits to prevent adder over-flow; and the amount of folding within the architecture.
A method of this aspect if the invention may define the architecture in a hardware definition language such as VHDL, Verilog etc., a netlist format such as edif, xnf, etc., an ASIC or FPGA layout or a combination of any of the above, amongst others.
Through the use of systematic folding and multiplexing, efficient silicon hardware for a variety of throughput specifications can be produced by embodiments of this invention. In most practical applications, the required throughput is orders of magnitude lower than the processing speed obtainable from these cores. An increase in silicon efficiency is therefore possible by the re-use of hardware resources at multiplier-accumulator level. The architecture described in this section is based on a systematic folding and retiming methodology which multiplexes low level operations (multiplication, addition, accumulation etc.) on to a reduced number of components. The amount of folding depends on the wavelet choice and the downsampling ratio. The folding factor is specified as a generic parameter along with the required wavelet type. Based on this information, an efficient folded architecture for a multi-level discrete wavelet transform is generated.
From another aspect, the invention provides a method for defining an architecture component for performing a wavelet transform, the definition including generating a list of generic parameters and ports for the component, connecting the delay line to the multipliers and appropriate coefficients to the multiplier through a multiplexor.
Such a method may further comprise one or more of the following steps: connecting an adder to the output of the multiplier; generating delay line taps; generating MAC units required; generating an adder and buffer memory and a latch; or generating interconnections within the architecture.
Embodiments of the invention will now be described in detail, by way of example only, and with reference to the accompanying drawings.