Pulse code modulation (PCM) is typically used for broadcasting digital audio signals. In order to more efficiently broadcast or record digital audio signals, the amount of digital information needed to reproduce the PCM-coded samples can be reduced by using a digital compression algorithm to produce a digitally-compressed representation of the original signal. Digital compression is useful wherever bandwidth is limited and there is an economic benefit to be realized by reducing the amount of information being passed at any time. For example, digital compression is typically used for high quality audio transmissions in video conferencing systems, satellite or terrestrial audio broadcasting systems, coaxial or optical cable audio transmission systems, and for storing audio signals on magnetic, optical and semiconductor storage devices. A standard digital audio encoded signal format has been set forth by the Motion Picture Experts Group (see, for example, ISO/IEC 11172-3 and ISO/IEC 13818-3). This format is commonly referred to as "MPEG Audio."
The term "psychoacoustics" relates to the field of sound as it is perceived by humans. According to psychoacoustic theory, certain sounds cannot be perceived, or perceived as accurately, as other sounds. Therefore, in compressing a digital representation of an audio signal, one may capitalize on this information and allocate more bits of data to represent the sounds that a human ear can more readily perceive and allocate less bits of data to represent the sounds that a human ear can less readily perceive.
Two primary aspects of psychoacoustics enable representation of an audio signal with less bits of data than would otherwise be necessary. These two aspects are quantization and masking. With respect to quantization, psychoacoustic theory recognizes that, within the range of perception of the human ear, the human ear is more sensitive to lower frequencies than to higher frequencies. Therefore, it has been recognized that higher frequencies of an audio signal may be represented with less bits of data than lower frequencies of an audio frequency without significant diminution in sound quality.
With respect to masking, when a person hears an audio signal (e.g., music), certain tones are perceived to overpower or "mask" other tones in the signal. In the digital signal processing field, frequency domain "masking" is a phenomenon that occurs whereby a tone or narrowband noise signal at one frequency affects the sensitivity of the ear to a tone or noise signal at a different frequency. The higher power or dominant signal is typically called the "masking tone," and a lower power or subservient signal is typically called a "masked tone." One method for determining which tones in a signal are masked is described in a co-pending application with having a title of "Method For Computing Masking Thresholds in Digital Audio Encoded Signals," filed Jun. 14, 1996, having a serial number of Ser. No. 60/019,907 now U.S. patent application Ser. No. 08/855,118 filed May 13, 1997 and now abandoned, having a Japanese convention application no. 157,156/97 filed Jun. 13, 1997 now Japanese Laid-open number 107,642/98 laid open Apr. 28, 1998. Tones that are masked may be omitted in a digital representation of the original audio signal without significant diminution of sound quality. In addition, tones that are partially masked may be represented by fewer bits of data than tones that are not masked. Therefore, a digital audio signal may be compressed by omitting masked tones and representing some tones with fewer bits of data than other tones.
In order to determine which tones are masked in the digital audio signal and to appropriately allocate the number of bits used to represent various frequencies in the digital audio signal, MPEG standards require a frequency representation of the digital audio signal. Conventionally, a frequency analysis of the digital audio signal is obtained through performing either a 512 point or a 1024 point fast Fourier transform on the digital audio signal. However, the number of calculations required to perform a fast Fourier transform is proportional to N log(N), where N is the number of points used for the fast Fourier transform. Performing such a transform may therefore require a large number of calculations and may slow the encoding process.