In the thirty years since the introduction of the Compact Disc (CD), the general public has come to accept “CD-quality” as the norm for digital audio. Meanwhile, two types of argument have raged in audio circles. One centres around the proposition that the 16 bits resolution and 44.1 kHz sampling rate of the CD are wasteful of data and that the equivalent sound can be conveyed by a more compact lossy-compressed format such as MP3 or AAC. The other takes the diametrically opposing view, asserting that the resolution and sampling rate of the CD are inadequate and that audibly better results are obtained using, for example, 24 bits and a sampling rate of 96 kHz, a specification commonly abbreviated to 96/24.
If 44 kHz is indeed not considered good enough, the question arises as to whether 96 kHz is the answer or whether 192 kHz or even 384 kHz should be the sampling rate for ‘ultimate’ quality. Many audiophiles assert that 96 kHz does sound better than 44.1 kHz and 192 kHz does indeed sound better than 96 kHz.
Historically, the transition from a continuous-time representation of an analogue waveform to a sampled digital representation has been justified by the sampling theorem (www.en.wikipedia.org/wiki/Sampling_theorem), which states that a continuous-time waveform containing only frequencies up to a maximum fmax can be reconstructed exactly from a sampled representation having 2×fmax samples per second. The frequency corresponding to half the sample rate is known as the Nyquist frequency, for example 48 kHz when sampling at 96 KHz.
Therefore, the continuous-time waveform is first filtered by a bandlimiting ‘anti-alias’ filter in order to remove frequencies above fmax that would otherwise be ‘aliassed’ by the sampling process and be reproduced as images below fmax. Following standard communications practice, the bandlimiting anti-alias filter usually approximates a flat frequency response up to fmax, so the frequency response graph has the appearance of a ‘brickwall’. The same applies to a reconstruction filter used to regenerate a continuous waveform from the sampled representation.
According to this methodology, the process of sampling and subsequent reconstruction is exactly equivalent to a time-invariant linear filtering process that removes frequencies above fmax and makes little or no change to frequencies significantly lower than fmax. It is therefore hard to understand that sampling at 192 kHz can sound better than sampling at 96 kHz, since the only difference would be the presence or absence of frequencies above about 40 kHz, which exceeds the conventional human hearing range of 20 Hz to 20 kHz by a factor two.
Two papers which attempt to partially explain this paradox are Dunn J “Anti-alias and anti-image filtering: The benefits of 96 kHz sampling rate formats for those who cannot hear above 20 kHz” preprint 4734 104th AES convention 1998 and Story M “A Suggested Explanation For (Some Of) The Audible Differences Between High Sample Rate And Conventional Sample Rate Audio Material” available from http://www.cirlinca.com/include/aes97ny.pdf.
Both suggest the reconciliation lies in looking at the filter's time domain response. Dunn finds that passband ripple has an effect like a pre- and post-echo, whilst Story looks at how the filter disperses the energy of an impulse in time. Although they point to different attributes, for both authors the issues reduce as sample rate increases. This is especially the case if a flat response is only maintained to 20 kHz instead of to near the Nyquist frequency, thus increasing the transition band before full alias rejection is required at the Nyquist frequency.
Story's approach is taken further in Craven, P. G., “Antialias Filters and System Transient Response at High Sample Rates”. Here Craven teaches that even if the decimation and interpolation systems in a 96 kHz system have a “brickwall” response giving the sonic disadvantages of wide dispersion of impulse energy, an “apodising” filter operating at the 96 kHz rate can widen the effective transition band, narrowing the dispersion of impulse energy. FIG. 1 shows the frequency response (solid line) of an illustrative brickwall filter downsampling to 96 kHz, and also the response (dashed line) of an apodising filter. The corresponding impulse responses of the filters are then shown in FIGS. 2A and 2B, illustrating how the highly dispersive time response of the brickwall filter in FIG. 2A is shortened by application of the apodising filter to the compact time response in FIG. 2B.
However, even with apodising, it is still the case today that sampling at higher rates than 96 kHz can give audible improvements described in the same terms as Story reports: “less cluttered”, “more air”, “better hf detail” and in particular “better spatial resolution”. A corollary is that the current state of the art loses something of these sonic attributes when using a moderate sample rate such as 96 kHz, despite useful progress in identifying what may be causing this loss.
Consequently, highest quality reproduction requires the use of extremely high sample rates with consequent impact on file sizes and bandwidth requirements. So, the prospects for interesting the public at large in high resolution sound appear bleak, with either onerous demands from the format or a realisation that quality has been lost. Accordingly, there is a need for an alternative methodology for distributing high quality audio at moderate sample rates which preserves the perceptual benefits associated with higher sample rates.