Many conventional signal-processing techniques are frame-based. In such techniques, a stream of data is divided into discrete frames, and the data within each such frame ordinarily is processed in a fairly uniform manner. In one example, an input audio signal is divided into frames of equal length. Then, each frame is processed in a particular manner. A common processing parameter to be determined for each frame is block length or, equivalently, into how many equal-sized blocks should the frame be divided for processing purposes. Block length determines resolution in the original domain (e.g., time for an audio signal) and in the frequency (or other transform) domain. More specifically, shorter block lengths provide greater resolution in the original domain and lesser resolution in the frequency domain.
An audio signal often consists of quasi-stationary episodes, each including a number of tonal frequency components, which are interrupted by dramatic transients. Thus, an individual frame of such an audio signal often will include a few samples corresponding to a transient, but with the vast majority of the samples corresponding to quasi-stationary portions of the signal.
Because transients in audio signals can be as short as a few of samples, the block size that is used within a frame that has been detected as including a transient ideally should be just a few samples as well, thereby matching the filter's temporal resolution to the transient. Unfortunately, it usually is not practical to use different block sizes within the same frame. Making all of the blocks within a frame having a detected transient just a few samples wide would result in extremely poor frequency resolution within the frame and, therefore, is inappropriate for the rest of the samples in the frame; that is, such other samples, provided they are sufficiently far away from the transient, are quasi-stationary and therefore are better processed using high frequency resolution. This conflict conventionally has resulted in a compromise block size that is optimal neither for the transient samples nor for the quasi-stationary samples in the same frame.
A block diagram of a conventional system for processing a frame of input samples 12 is illustrated in FIG. 1. Initially, samples 12 are analyzed in transient detector 14 to determine whether the frame includes a transient.
Based on that detection, a window function is selected in module 16. In this regard, audio-coding algorithms often employ a filter bank that has different temporal-frequency resolutions. One commonly used filter bank is the MDCT (Modified Discrete Cosine Transform), having an impulse response that can be described by the following basis function:
            h      ⁡              (                  k          ,          n                )              =                  w        ⁡                  (          n          )                    ⁢                        2          M                    ⁢              cos        ⁡                  [                                    π              M                        ⁢                          (                              n                +                                                      M                    +                    1                                    2                                            )                        ⁢                          (                              k                +                                  1                  2                                            )                                ]                      ,where k=0, 1, . . . , M−1; n=0, 1, . . . , 2M−1; and w(n) is a window function of length 2M. See, e.g., H. S. Malvar, “Signal Processing with Lapped Transforms”, Artech House, 1992 (referred to herein as Malvar).
In this case, the temporal-frequency resolution is determined by M, which sometimes is referred to herein as block size. A large M means low temporal resolution but high frequency resolution, while a small M means high temporal resolution and low frequency resolution.
For purposes of implementing module 16 (as shown in FIG. 1), conventional coding algorithms typically use two block sizes. A large block size, implemented as a single block covering the entire frame, is used if no transient was detected in module 14. Alternatively, a small block size, implemented as a predetermined number of blocks covering the frame, is used if a transient was detected.
The principal window functions corresponding to these two block sizes are window function 30 (shown in FIG. 2 and labeled as WIN_LONG_LONG2LONG) and window function 40 (shown in FIG. 3 and labeled as WIN_SHORT_SHORT2SHORT), respectively. In order for the MDCT to be able to properly switch between these two principal window functions, the perfect reconstruction conditions (e.g., as described in Malvar) require the use of three transitional window functions, e.g.: window function 50 (shown in FIG. 16 and labeled as WIN_LONG_LONG2SHORT), window function 60 (shown in FIG. 5 and labeled as WIN_LONG_SHORT2LONG), and window function 70 (shown in FIG. 6 and labeled as WIN_LONG_SHORT2SHORT). It is noted that all three such transitional window functions 50, 60 and 70 are for use with the long block (i.e., covering an entire frame).
Thus, in the conventional techniques a frame is assigned a single long block (and corresponding long window 30, 50, 60 or 70) or a sequence of identical short blocks (and corresponding identical short windows 40). Because each block is longer than the block-to-block spacing, the result is an overlapping sequence of long and short windows, such as the sequence 80 of window functions shown in FIG. 7, with each window covering the M new samples of the current block together with M samples in the previous block. For reference purposes, the middle of each block corresponding to a window function 30, 40, 50, 60 or 70 is designated as 31, 41, 51, 61 or 71, respectively, in the drawings.
It is noted that such conventional techniques select the window function for a frame that does not include a transient, based not only on the detection made by module 14 for such current frame, but also based on similar detections made for the previous and subsequent frames. That is, window functions 50, 60 and 70 are used as transition window functions between transient frames and non-transient frames.
Referring back to FIG. 1, in module 17 the window function selected in module 16 is then applied (multiple times for a transient frame) to the input samples 12 for the current frame. That is, for each block the sample values are multiplied by the window function values corresponding to that block in order to obtain a set of weighted values.
Those weighted values are then processed in module 19 using the selected window function to provide the output values 22. The specific type of processing performed in module 19 can vary depending upon the desired application. For example, with respect to an audio signal, the processing might involve analysis, coding, and/or enhancement.