The present disclosure relates to digital audio signals, and to systems and methods for detecting the occurrence of transients in digital audio signals.
Digital-based electronic media formats have become widely accepted. The development of faster computer processors, high-density storage media, and efficient compression and encoding algorithms have led to an even more widespread implementation of digital audio media formats in recent years. Digital compact discs (CDs) and digital audio file formats, such as MP3 (MPEG Audio-layer 3) and WAV, are now commonplace. Some of these formats store the digitized audio information in an uncompressed state while others use compression. The ease with which digital audio files can be generated, duplicated, and disseminated also has helped increase their popularity.
Audio information can be detected as an analog signal and represented using an almost infinite number of electrical signal values. An analog audio signal is subject to electrical signal impairments, however, that can negatively affect the quality of the recorded information. Any change to an analog audio signal value can result in a noticeable defect, such as distortion or noise. Because an analog audio signal can be represented using an almost infinite number of electrical signal values, it is also difficult to detect and correct defects. Moreover, the methods of duplicating analog audio signals cannot approach the speed with which digital audio files can be reproduced. These and many other problems associated with analog audio signals can be overcome, without a significant loss of information, simply by digitizing the audio signals.
FIG. 1 presents a portion of an analog audio signal 100. The amplitude of the analog audio signal 100 is shown with respect to the vertical axis 105 and the horizontal axis 110 indicates time. In order to digitize the analog audio signal 100, the waveform 115 is sampled at periodic intervals, such as at a first sample point 120 and a second sample point 125. A sample value representing the amplitude of the waveform 115 is recorded for each sample point. If the sampling rate is less than twice the frequency of the waveform being sampled, the resulting digital signal will be substantially identical to the result obtained by sampling a waveform of a lower frequency. As such, in order to be adequately represented, the waveform 115 must be sampled at a rate greater than twice the highest frequency that is to be included in the reconstructed signal. To ensure that the waveform is free of frequencies higher than one-half of the sampling rate, which is also known as the Nyquist frequency, the audio signal 100 can be filtered prior to sampling. Therefore, in order to preserve as much audible information as possible, the sampling rate should be sufficient to produce a reconstructed waveform that cannot be differentiated from the waveform 115 by the human ear.
The human ear generally cannot detect frequencies greater than 16-20 kHz, so the sampling rate used to create an accurate representation of an acoustic signal should be at least 32 kHz. For example, compact disc quality audio signals are generated using a sampling rate of 44.1 kHz. Once the sample value associated with a sample point has been determined, it can be represented using a fixed number of binary digits, or bits. Encoding the infinite possible values of an analog audio signal using a finite number of binary digits will almost necessarily result in the loss of some information. Because high-quality audio is encoded using up to 24-bits per sample, however, the digitized values closely approximate the original analog values. The digitized values of the samples comprising the audio signal can then be stored using a digital-audio file format.
The acceptance of digital-audio has increased dramatically as the amount of information that is shared electronically has grown. Digital-audio file formats, such as MP3 (MPEG Audio-layer 3) and WAV, that can be transferred between a wide variety of hardware devices are now widely used. In addition to music and soundtracks associated with video information, digital-audio is also being used to store information such as voice-mail messages, audio books, speeches, lectures, and instructions.
The characteristics of digital-audio and the associated file formats also can be used to provide greater functionality in manipulating audio signals than was previously available with analog formats. One such type of manipulation is filtering, which can be used for signal processing operations including removing various types of noise, enhancing certain frequencies, or equalizing a digital audio signal. Another type of manipulation is time stretching, in which the playback duration of a digital audio signal is increased or decreased, either with or without altering the pitch. Time stretching can be used, for example, to increase the playback duration of a signal that is difficult to understand or to decrease the playback duration of a signal so that it can be reviewed in a shortened time period. Compression is yet another type of manipulation, by which the amount of data used to represent a digital audio signal is reduced. Through compression, a digital audio signal can be stored using less memory and transmitted using less bandwidth. Digital audio processing strategies include MP3, AAC (MPEG-2 Advanced Audio Codec), and Dolby Digital AC-3.
Many digital audio processing strategies manipulate the digital audio data in the frequency domain. In performing this processing, the digital audio data can be transformed from the time domain into the frequency domain block by block, each block being comprised of multiple discrete audio samples. By manipulating data in the frequency domain, however, some characteristics of the audio signal can be lost. For example, an audio signal can include a substantial signal change, referred to as a transient, that can be differentiated from a steady-state signal. A transient is typically characterized by a sharp increase and decrease in amplitude that occur over a very short period of time. The signal information representing a transient can be distorted during frequency domain processing, which commonly results in a pre-echo or transient smearing that diminishes the quality of the digital audio signal.
In order to transform a digital audio signal from the time domain, a processing algorithm may convert the blocks of samples into the frequency domain using a Discrete Fourier Transform (DFT), such as the Fast Fourier Transform (FFT). The number of individual samples included in a block defines the time resolution of the transform. Once transformed into the frequency domain, the digital audio signal can be represented using magnitude and phase information, which describe the spectral characteristics of the block. After the window of digital audio data has been processed, and the spectral characteristics of the window have been determined, the digital audio data can be converted back into the time domain using an Inverse Discrete Fourier Transform (IDFT), such as the Inverse Fast Fourier Transform (IFFT).
In order to control pre-echo, some processing algorithms attempt to detecting transient signals in the time domain, before the digital audio data is converted into the frequency domain. If a transient is detected in the time domain, a different, often shorter, block of samples can be identified for frequency domain processing. This does not eliminate the pre-echo but essentially constrains the effect of the pre-echo to the shorter block, which may not be audible. This can be computationally difficult and expensive, as the processing algorithm cannot employ a standard block size. Nonetheless, transients in a digital audio signal ideally should be identified in order to process the signal at a high-quality.