Embodiments according to the invention are related to a signal processor for providing a processed version of an input signal in dependence on the input signal, to a window provider for providing signal processing window values, to an encoded media signal, to a method for processing a signal and to a method for providing signal processing window values.
An embodiment according to the invention is related to an apparatus for encoding or decoding an audio or video signal using variable window functions. Another embodiment according to the invention is related to a method for encoding or decoding an audio or video signal using variable window functions.
Embodiments according to the present invention generally relate to signal analysis and processing methods, such as those which may be used in audio or video coding systems.
Finite impulse response (FIR) filtering of discrete signals, particularly in the context of filter banks, is widely employed in spectral analysis, processing, synthesis, and media data compression, amongst other applications. It is well understood that the temporal (or spatial) finiteness of an FIR filter, and hence the finiteness of the signal interval which can be processed at one instant in time or space, can lead to a phenomenon known as bias or leakage. When modifying the filtered interval, for example by varying gain changes or quantization, blocking or ringing artifacts can occur upon inversion of the filtering operation. It has been found that the cause of these artifacts can be ascribed to discontinuities between the endpoints of the signal waveform of the processed interval (hereafter referred to as segment), as well as those of its differentials. It has been found that in order to reduce such unwanted effects of leakage, it thus is helpful or even necessitated to minimize discontinuities in the segment and some of its differentials. This can be achieved by multiplying each sample s(n), n=0, 1, . . . , N−1, of the N-length segment with a certain weight w(n) prior to filtering, and in the case of signal manipulation in the filtered domain, also after inverse filtering, such that the endpoints of the segment and of its differentials are tapered to zero. An equivalent approach is to apply the weights to each basis filter of the filter bank (See, for example, reference [2]). Since the weighting factors are often described by an analytical expression, a set of factors is commonly known as a weighting function or window function.
In typical audio and video coding systems, a source waveform is segmented as above, and each segment is quantized to a coarser representation to accomplish high data compression, i.e. a low bit rate necessitated for storing or transmitting the signal. In an attempt to obtain coding gain by means of energy compaction into fewer than N samples (or, in other words, to increase perceptual quality of the coded signal for a given bit rate), filter-bank transformations of the segments prior to quantization have become popular. Recently developed systems use lapped orthogonal time-to-frequency transformation in the form of the modified discrete cosine transform (MDCT), a filter bank allowing adjacent segments to overlap while still permitting critical sampling. For improved performance, the forward and inverse MDCT operations are combined with weighting of each segment: on center side, an analysis window wa(n) is applied before the forward MDCT and on receiver side, a synthesis window ws(n) is employed after the inverse MDCT. Unfortunately, not all weighting functions are suitable for use with the MDCT. Assuming predetermined (time/space invariant) windows, it has been found that in order for the entire architecture to yield perfect input reconstruction in the absence of quantization or transmission errors, wa(n) and ws(n) has to be chosen as follows:wa(n)·ws(n)+wa(N/2+n)·ws(N/2+n)=1,n=0,1, . . . ,N/2−1.  (1)If wa(n) and ws(n) are to be identical, i.e. wa(n)=ws(n)=w(n), eq. (1) reduces to the better-known constraintw(n)2+w(N/2+n)2=1,n=0,1, . . . ,N/2−1,  (2)published in reference [7]. For best energy compaction, w(n) which are symmetric about n=N/2−½, i.e.w(N−1−n)=w(n),n=0,1, . . . ,N/2−1,  (3)are usually adopted. In the Advanced Audio Coding (AAC) standard (reference [8]), two window functions are available. One is the sine window, given bywsin(n)=sin(π·(n+½)/N),n=0,1, . . . ,N−1,  (4)the other is a Kaiser-Bessel-derived (KBD) window described in the patents of Fielder and Davidson, entitled “Low bit rate transform coder, decoder, and encoder/decoder for high-quality audio,” U.S. Pat. Nos. 5,109,417 and 5,142,656. The latter window is also utilized in the AC-3 (Dolby Digital) coding standard (ATSC, Inc., “Digital Audio Compression Standard (AC-3, E-AC-3), Revision B,” document A/52B, June 2005), albeit in a different configuration (α=5). The Vorbis specification (reference [9]) defines the windowwvorbis(n)=sin(π/2·sin2(π·(n+½)/N)),n=0,1, . . . ,N−1.  (5)
FIG. 5 shows the frequency responses of the AAC and Vorbis window functions, obtained via Fourier transformation, according to reference [4]. It can be seen that the sine window has relatively high close-frequency selectivity (narrow main lobe) and relatively low stopband rejection (low side lobe attenuation). The KBD window, on the contrary, has high stopband attenuation and low close-frequency selectivity. The Vorbis window lies about midway between the former two windows.
It has been found that for some applications, it may be desirable to exert finer control over the passband selectivity and stopband rejection of a weighting function satisfying eq. (2). More specifically, it has been found that to improve coding efficiency, a window parameter may be necessitated to continuously adapt the characteristics of the window to those of the input spectrum. Of all three functions discussed above, only the KBD function offers such a parameter, α, which can be varied to achieve different selectivity/attenuation tradeoffs. This function, however, incorporates computationally expensive mathematics (Bessel function, hyperbolic sine, square root, and division), potentially prohibiting its re-computation for every signal segment on low-power devices or in real-time systems. The same applies to the class of window functions presented in the article of Sinha and Ferreira, entitled “A New Class of Smooth Power Complementary Windows and their Application to Audio Signal Processing,” AES 119th Convention, October 2005, paper 6604, necessitating complex-valued operations, spectral factorization, and Fourier transformation. It also has been found that interpolation between two functions (for example KBD and sine), most efficiently by weighted summation, can be used to somewhat control the frequency response, but this approach offers only limited flexibility.
A multitude of window functions, optimized toward different criteria, have been documented, for example, in references [1], [2], [3], [4], [5]. Arguably three of the most popular functions in use today are the ones reported by von Hann, Hamming, and Blackman.
In the following, some classic window functions will be described. In other words, in the following, the aforementioned window functions (for example, Hann, Hamming and Blackman) will be revisited and the underlying general design equation will be identified.
For the sake of consistency and comparability with seminal investigations of window functions, Nuttall's methodology and notation (see, for example, reference [4]) shall be adopted in the present discussion. In particular, let L denote the duration (length) of a window realization, t the location (time) within the weighting, and f the frequency within the window's power density spectrum, obtained by Fourier transformation of the window function. Additionally, all window functions shall be normalized to a peak amplitude of one. Since only symmetrical (advantageously even length), bell-shaped windows will be studied here, this implies w(L/2)=1. The first weighting function to be considered is known as the Hann (or Hanning) function. It is specified in reference [2] as
                                          w            Hann                    ⁡                      (            t            )                          =                              sin            2                    ⁡                      (                          π              ·                              t                L                                      )                                              (        11        )            for DSP applications (nonnegative values of t). As shown in reference [2] and evident from (11), the Hann function is a special case of a class of exponentiated sine functions:
                                                        w              a                        ⁡                          (              t              )                                =                                    sin              a                        ⁡                          (                              π                ·                                  t                  L                                            )                                      ,                  a          ≥          0.                                    (        12        )            In practice, positive integers are typically assigned to a. Note that (12) can also be written as the sum of an offset and a scaled cosine:
                                          w            Hann                    ⁡                      (            t            )                          =                  0.5          -                      0.5            ⁢                                          cos                ⁡                                  (                                      2                    ⁢                                          π                      ·                                              t                        L                                                                              )                                            .                                                          (        13        )            
This formulation allows for a particular spectral optimization of the Hann window (see the discussion below regarding evaluation and optimization) by changing the offset and the scaling factor. The outcome is the Hamming function, whose exact parameterization is given in reference [4] as
                                          w            Hamming                    ⁡                      (            t            )                          =                  0.53836          -                      0.46164            ⁢                                          cos                ⁡                                  (                                      2                    ⁢                                          π                      ·                                              t                        L                                                                              )                                            .                                                          (        14        )            
As pointed out by Nuttall (see, for example, reference [4]), the Hann and Hamming windows are two-term realizations of a class of (K+1)-term functions which shall be referred to as sum-of-cosines functions. Simplifying Nuttall's notation, they can be written as
                                          w            b                    ⁡                      (            t            )                          =                              ∑                          k              =              0                        K                    ⁢                                                    (                                  -                  1                                )                            k                        ⁢                          b              k                        ⁢                          cos              ⁡                              (                                  2                  ⁢                  k                  ⁢                                                                          ⁢                                      π                    ·                                          t                      L                                                                      )                                                                        (        15        )            for usage in DSP applications. This equals equation 11 of reference [4] with scalar 1/L omitted. Three-term implementations are also common. A simple case is (15) with K=2 and factorsb0=0.375,b1=0.5,b2=0.125,  (16)which is equivalent to (12) with a=4. Similar to Hamming's approach, Blackman, (see for example, reference [1]) derived the following optimized bk:b0=0.42,b1=0.5,b2=0.08.  (17)Nuttall (see, for example, reference [4]) further refined Blackman's values for better near-field spectral response (first side lobes, see the discussion below regarding evaluation and optimization):b0=0.40897,b1=0.5,b2=0.09103.  (18)
The interested reader is encouraged to take a look at reference [4] for other optimized 3- and 4-term sum-of-cosines windows.
In view of the above discussion, what is needed is an alternative window function having a moderate computational complexity, but providing a good design flexibility.