An audio signal may be transmitted from a local (transmitting) device, such as a user device or a media server, to a remote (receiving) device, such as another user device, via a communication channel. For example, the audio signal may be transmitted as a stream of audio data (audio stream). The communication channel may for example be a channel over a communications network, for example a packet-based communication network such as the Internet, where the devices are endpoints of the network.
The transmission may be based on VoIP (Voice over Internet Provider) technology in a call or other real-time media communication event conducted over the network. That is, the audio stream may be transmitted as part of a call between two or more users or other some other real-time media communication event conducted via the network. To enable the communication event to take place, a user of the receiving device may execute an instance of a communication client on the receiving device. The communication client sets up the necessary VoIP connections to allow communication with the transmitting device during the communication event. The transmitting device may also be a user device, on which another instance of the communications client is executed. Alternatively, the transmitting device may be a media server; for example, in a group call (conference call) between three or more users, each user may transmit their audio stream to a media relay server, and the server may selectively mix the received audio streams accordingly for transmission to the other users participating in the conference call.
The transmitted audio data may be coded audio data generated by an audio codec of the local device applying audio coding to the audio signal before transmission. The audio codec may be configured to code the audio signal according to a target bitrate, so as to generate a stream of coded audio data having a bitrate no more than the target bitrate, or in the case of a variable bitrate codec, one with a short-term average which does not exceed the target bitrate. As long at the target bitrate does not exceed an available bitrate of the communication channel (i.e. channel bitrate), the coded audio stream can be transmitted via the communication channel in real-time without having to any drop packets of coded audio data. The coded audio stream is received at the receiving device, decoded and output via an audio output device of the receiving device, such as a loudspeaker or headset.
The audio coding may involve lossless compression such as entropy encoding, whereby the amount of data needed to code the signal is reduced without losing any information from the signal. Whilst this can be effective at reducing the bitrate overhead needed to transmit the audio signal to some extent, in practice it is unlikely to be enough in itself to meet the target bitrate. To further reduce the bitrate of the coded audio stream, lossy audio coding can be used, whereby information is discarded from the audio signal as part of the audio coding process. For some speech and audio codecs, the lossy coding includes an initial down-sampling of the input signal. This is used when coding at very low target bitrates, where coding distortion begins to severely impact the quality of the coded signal. Internally in the codec, the potentially down-sampled input signal is then modelled using, e.g., mathematical models that are chosen due to their properties for modelling human speech using only a limited set of coefficients. This can be interpreted as doing a joint quantization of the samples within each frame, and often also with a dependency on previous samples as well.
In order for an audio codec to meet a certain target bitrate and to ensure the best quality at that bitrate then, broadly speaking, the sample rate, the allowed coding noise at a given sample rate, or a combination of both can be adapted.
With regards to sample rate, the coded audio data has an audio bandwidth, which is the range of audio frequencies spanned by the coded audio data i.e. the coded audio data contains only enough information to reproduce audio frequencies from the original audio signal within this range of audio frequencies. It is generally accepted that audio frequencies above 20 kHz are inaudible to most humans thus, by discarding frequencies above this, information can be discarded from the signal with no or negligible impact on perceived quality. According to the Nyquist theorem, the audio bandwidth and sample rate are tightly coupled, in that to capture all frequencies up to R/2 Hz without distortion due to aliasing, a sample rate of at least R samples per second is required. Thus, to re-produce the full range of audible frequencies in the audio signal, sampling significantly above 40 kHz (i.e. 40.000 samples per second) is generally considered unnecessary; for example, 44.1 kHz and 44 kHz are two commonly used sample rates that are generally accepted as full-band sampling rates. Conversely, sampling at R significantly below 40 kHz can result in a loss of audible high frequency components of the audio signal between R/2 and 20 kHz and aliasing artefacts, as the audio bandwidth is reduced. In practice, an audio codec may also include an anti-aliasing filter (AAF) that filters out frequencies above R/2 from the audio signal before sampling it at rate R, as this can prevent aliasing artefacts.
With regards to coding, broadly speaking, more aggressive coding (i.e. coarser quantization and more inaccurate modelling) results in a higher level of coding distortion. Modern audio codecs provide increasingly sophisticated modelling of the signal so as to minimize the coding distortion for a given sample rate and target bitrate, but nevertheless this basic relation between coarseness and distortion stands.