1. Field of the Invention
The present invention relates generally to encoding and decoding audio signals, and more particularly, to encoding and decoding audio signals with an audio bandwidth up to approximately 22 kHz using at least two transforms.
2. Description of the Related Art
Audio signal processing is utilized in many systems that create sound signals or reproduce sound from such signals. With the advancement of digital signal processors (DSPs), many signal processing functions are performed digitally. To do so, audio signals are created from acoustic waves, converted to digital data, processed for desired effects, converted back to analog signals, and reproduced as acoustic waves.
The analog audio signals are typically created from acoustic waves (sound) by microphones. The amplitude of the analog audio signal is sampled at a certain frequency, and the amplitude is converted to a number that represents the amplitude. The typical sampling frequency is approximately 8 kHz (i.e., sampling 8,000 times per second), 16 kHz to 196 kHz, or something in between. Depending on the quality of the digitized sound, each sample of the sound may be digitized using 8 bits to 128 bits or something in between. To preserve high quality sound, it may take a lot of bits. For example, at a very high end, to represent one second of sound at a 196 kHz sampling rate and 128 bits per sample, it may take 128 bits×192 kHz=24 Mbit=3 MB. For a typical song of 3 minutes (180 seconds), it takes 540 MB. At the low end, in a typical telephone conversation, the sound is sampled at 8 kHz and digitized at 8 bits per sample, it still takes 8 kHz×8 bit=64 kbit/second=8 kB/second. To make the digitized sound data easier to use, store and transport, they are typically encoded to reduce their sizes without reducing the sound quality. When they are about to be reproduced, they are decoded to restore the original digitized data.
There are various ways that have been suggested to encode or decode audio signals to reduce their size in the digital format. A processor or a processing module that encodes and decodes a signal is generally referred to as a codec. Some are lossless, i.e., the decoded signal is exactly the same as the original. Some are lossy, i.e., the decoded signal is slightly different from the original signal. A lossy codec can usually achieve more compression than a lossless codec. A lossy codec may take advantage of some features of human hearing to discard some sounds that are not readily perceptible by humans. For most humans, only sound within an audio spectrum between approximately 20 Hz to approximately 20 kHz is perceptible. Sound with frequency outside this range is not perceived by most humans. Thus, when reproducing sound for human listeners, producing sound outside the range does not improve the perceived sound quality. In most audio systems for human listeners, sounds outside the range are not reproduced. In a typical public telephone system, only frequencies within approximately 300 Hz to approximately 3000 Hz are communicated between the two telephone sets. This reduces data transmission.
One popular method for encoding/decoding music is the method used in an MP3 codec. A typical music CD can store about 40 minutes of music. When the same music is encoded with an MP3 encoder at comparable acoustic quality, such a CD may store 10-16 times more music.
ITU-T (International Telecommunication Union Telecommunication Standardization Sector) Recommendation G.722 (1988), entitled “7 kHz audio-coding within 64 kbit/s,” which is hereby incorporated by reference, describes a method of 7 kHz audio-coding within 64 kbit/s. ISDN lines have the capacity to transmit data at 64 kbit/s. This method essentially increases the bandwidth of audio through a telephone network using an ISDN line from 3 kHz to 7 kHz. The perceived audio quality is improved. Although this method makes high quality audio available through the existing telephone network, it typically requires ISDN service from a telephone company, which is more expensive than a regular narrow band telephone service.
A more recent method that is recommended for use in telecommunications is the ITU-T Recommendation G.722.1 (1999), entitled “Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss,” which is hereby incorporated herein by reference. This Recommendation describes a digital wideband coder algorithm that provides an audio bandwidth of 50 Hz to 7 kHz, operating at a bit rate of 24 kbit/s or 32 kbit/s, much lower than the G.722. At this data rate, a telephone having a regular modem using the regular analog phone line can transmit wideband audio signals. Thus, most existing telephone networks can support wideband conversation, as long as the telephone sets at the two ends can perform the encoding/decoding as described in G.722.1.