1. Field of the Invention
This invention relates to the processing of digital audio signals.
2. Description of the Prior Art
Many modern audio signal processing devices perform audio processing operations on sampled digital audio signals rather than on analogue signals. Digital audio signals which signify discrete and clearly defined voltage levels have the advantage over analogue signals that they can be repeatedly copied without any degradation of the audio signal quality. Furthermore, error correction techniques can be applied to digitally encoded data prior to reproduction and some types of noise can be removed. Once an audio signal has been stored in digital form the data is not restricted to a particular time domain but can be manipulated and processed at will. The compact disc is an example of a digital audio storage device.
An analogue audio signal can be converted to its digital equivalent using a technique known as pulse code modulation (PCM). In the PCM technique the analogue signal is sampled in time at a frequency high enough to enable the desired audio bandwidth to be achieved. In the case of the CD the specified audio bandwidth is 20 Hz to 20 kHz and a sampling frequency of 44.1 kHz is typically used. In a process known as quantisation the measured signal voltage level at the instant of sampling is represented numerically as its nearest equivalent value in binary form. The mapping of voltage values from a continuous range to a finite number of discrete levels results in quantisation error. The larger the number of bits per sample, the smaller the quantisation error. Consumer audio applications typically use 16 bit numbers which can represent 216 distinct voltage levels.
The CD has a comparatively large dynamic range, typically greater than 90 dB whereas an analogue tape may have a dynamic range of only 40 dB.
Although the perception of sound is a complex topic and is not well understood, it has been proposed that the ear responds to average levels rather than peak levels when judging loudness.
In almost every sound system there is a need to control the audio signal dynamics to ensure that quiet segments are audible to the listener and that loud segments do not cause distortion or system damage. Since it has also been proposed that the highest peak will generally determine the comfortable listening level for the volume setting, a large dynamic range may mean that the quieter audio segments are too quiet for the listener to distinguish. The process of dynamic range reduction is known as dynamics processing and it involves non-linear adjustment (or “compression”) of the gain applied to an audio signal. Compression typically results in the audio signal sounding louder on replay.
A compressor is a voltage controlled amplifier with an input, an output and at least one control port fed by a level or peak detector. The signal level at which compression kicks in is known as the threshold. In upward compression, the threshold defines the point below which boost ensues. In downward compression the threshold defines the point above which gain reduction occurs. A compressor whose output level changes by 1 dB for an N dB input level change above threshold has an N:1 compression ratio. Typical compression ratios would be 2:1 or 4:1.
Compression is not applied independently to each input sample. The delay between the onset of a transient and the time when compression commences is known as the attack. The delay between the subsidence of a transient and the time the compressor returns to a resting gain is the release or decay.
If the attack takes too long the signal may “clip” but if it happens too quickly the audio content loses dynamic impact for the listener. The release time determines what frequencies the compressor can process without inducing undue distortion. Frequencies below the reciprocal of the release time are subject to increased distortion e.g. for a release time of 10 ms distortion increases below 100 Hz. Since no single attack/release suits all audio signals, split-band compressors are sometimes used in which each audio band feeds a separate compressor whose attack and release are optimised for the particular band.
FIG. 1 of the accompanying drawings schematically illustrates a known dynamics processor. The input digital audio signal is supplied to a peak/level detector 10 and an output signal 5 of the peak/level detector is supplied as input to a dynamics processing unit 20. The dynamics processing unit 20 can be arranged to produce various different dynamics processing functions depending on the relationship between the gain control value generated and the detected envelope of the input digital audio signal.
Such techniques involving compression operations in the time domain can be used to make reproduced audio sound louder to the listener. Similar analogue techniques for making reproduced audio signals sound louder have been around for many years in the broadcasting industry and used for example to make a particular radio station sound louder than its competitors while still being subject to the same FM deviation limits.
The output voltage levels on replay of a PCM digital recording are constrained by the number of quantization levels which is determined by the number of bits used to encode each sample. The maximum output voltage will be dependent upon the maximum binary sample value and the discrete voltage levels will typically be equally spaced. Any excursions of the voltage level above the maximum output voltage will result in a distortion of the waveform known as “clipping”.
Consider an example where the dynamics processing involves a simple 4:1 compression above an input envelope threshold T. This example compression response is illustrated schematically by FIG. 2A of the accompanying drawings. FIG. 2B shows the corresponding gain which is unity below the threshold and ¼ above the threshold. The output of the dynamics processing unit 20 is a gain control value 15 that will typically be a time dependent function calculated in dependence upon numerous input samples. The gain control value 15 is supplied as an input to a gain controller 40. The peak/level detection unit 10 outputs a signal 25 corresponding to a sequence of input signal values to a time delay unit 30. The time delay unit 30 delays the input samples to compensate for the processing time required by the dynamics processing unit 20. The output 35 of the time delay unit 30 is supplied to the gain controller where the input sample values are multiplied by the appropriate gain control value 15 to produce an output signal.