Generally speaking, within the timeframes in which audio signal processing occurs, most of a typical audio signal is quasi-stationary in nature, meaning that its statistics (e.g., in the frequency domain) change relatively slowly. However, it is also fairly common for such quasi-stationary portions to be punctuated and/or separated by transients. A transient can be defined in a variety of different ways, but generally it is a portion of the signal having a very short duration in which the statistics are significantly different than the portion of the signal immediately preceding it and the portion of the signal immediately following it (often, a sudden change in signal energy). It is noted that such preceding and following portions also may differ from each other, depending upon whether the transient occurs during an otherwise quasi-stationary segment or whether it marks a change from one quasi-stationary portion to another.
In order to both efficiently and accurately encode a given audio signal, all or nearly all conventional audio-signal processing techniques encode data in frames (e.g., each consisting of 1,024 new samples together with some overlap of a preceding frame). For the quasi-stationary portions of the signal, a frequency transform typically is provided over the entire frame, thereby providing good frequency resolution.
However, as is well known, the cost of good frequency resolution is poor time resolution. While that result is acceptable for a quasi-stationary portion of the signal, applying a long transform to a portion of an audio signal that includes a transient essentially would spread the transient's energy over the entire transform interval, thereby resulting in significant audible distortion.
Thus, most of the conventional audio-signal-processing techniques attempt to identify where transients occur and then perform different processing within the immediate neighborhood of a transient than is performed for the quasi-stationary portions of the signal. For example, by using a much shorter transform interval, it often is possible to confine the transient's effects approximately to the time interval in which the transient actually occurs. Of course, the cost of such increased time resolution is proportionately poorer frequency resolution. However, good frequency resolution typically is not as important when reproducing a transient, because human audio perception is not as sensitive over such a short period of time.
In order for the foregoing differential processing (between quasi-stationary portions and transient portions) to occur, it is necessary to accurately identify where transients occur in the first instance. Several different conventional approaches have been employed for detecting transients within an audio signal. Examples include simply defining a transient whenever an amplitude change of sufficient magnitude occurs or transforming the audio signal into the frequency domain and then defining a transient whenever a frequency change of sufficient magnitude occurs. However, each of such approaches has its own limitations.