1. Field of Invention
The invention relates to audio signal transmission, and more particularly to varying the sample-rate to improve coding gain for audio signals.
2. Description of Related Art
There are a number of decisions which must be made in setting up an audio compression system. Among the most important variables that affect audio quality during encoding are the sampling rate, bit rate, and the frequencies that will be encoded, such as 20 Hz-20 KHz or some lesser range, for example. For a given level of distortion and a given algorithm, more bits are required to transmit more signal frequencies. Therefore, there is a optimal match between bit rate and frequency range such that if the bit rate is specified, distortion will increase if more frequencies are encoded then is optimal for that bit rate.
Most high-quality audio algorithms, such as MPEG AAC (MPEG Advanced Audio Coder), PAC (Perceptual Audio Coder), MPEG layer3, Dolby AC3 (Advanced Coder 3), and NTT's TwinVQ, encode a fixed number of samples into each frame which then represent a unit of time for a particular algorithm. Each audio frame carries side information. The number of bits needed to encode the side information per frame is roughly constant. This side information imposes a per-frame overhead.
The frame frequency (i.e., the number of frames per second) used by an audio algorithm is proportional to the sampling rate because each frame encodes a constant number of samples.
Decreasing the sampling rate decreases the number of frames-per-second, which in turn decreases the number of bits diverted for overhead, allowing more bits to be used for audio coding. Thus, lowering the sampling rate results in more bits being available for audio coding which results in a higher quality signal as long as sufficient frequency range is preserved.
To a similar end, the statistical properties of music indicate that an optimal frame duration is about 40 ms. For AAC and PAC at sampling rates of 44100 sps (samples per second) (i.e., the CD sample rate) the frame duration is about 23 ms; at 22050 sps, the frame duration is 46 ms.
The lower the sampling rate, the lower the frequency range that can be transmitted, as described by the Nyquist rule, which limits the maximum frequency range to half of the sampling rate. In practical implementations a "guard band" is needed which further lowers the achievable maximum frequency range. For example, for any algorithm (e.g. AAC), at a sampling rate of 22050 sps, the maximum frequency range is 8 to 10 KHz.
Thus, for a given algorithm, and for a given bit rate b.sub.0 that is not sufficient for encoding the entire human-audible frequency range in a transparent manner without audible distortion, and for a specified acceptable level of distortion, there is a maximum frequency range f.sub.0 that one can encode, and that maximum will be associated with a sample rate f.sub.s0.
If there were no outside constraints, then one would use f.sub.s0 as the sampling rate. However, several outside constraints exist. For example, PCs and Macintoshes work mostly at 44100, 22050 and 11025 sps. Some PCs work at one or more of the rates 48000, 32000, 24000, 16000 and 8000 sps, but very few PCs will work at all of these sample rates. In fact, Macintosh audio hardware will not work at all at these latter sample rates, so a user is constrained to a small set of sample rates if he or she want to interact with PCs and an even smaller set of sample rates if one wants to interact transparently with Macs without involving potentially inferior resampling in the PC or Mac.