In most applications of mobile communication voice is still the most important media component. All speech encoders and the mechanisms around the speech encoders are optimized for voice. Music was not considered important in the design of the mobile communication components.
Recently, music has become more important in applications, e.g. in “Music-on-Hold” or “Music-ring-back-Tones”.
In order to save radio and network link capacity, a voice activity detector (VAD) was developed for the discrimination between speech and pause. If a pause was detected, no signals were transmitted from the silent parts to the other party. Later it turned out that it is very unpleasant for the user when the loudspeaker is totally silent between talk snatches of the other partner. As a consequence, comfort noise was invented in which the terminal receiving the speech signal creates the comfort noise on its own just on the basis of a few silence descriptor (SID) parameters transmitted every now and then.
This operation is called discontinuous transmission (DTX) controlled by the voice activity detector within the speech codec. However, the VAD is not working well for music signals. Often, music signals are falsely classified as background noise and are replaced by comfort noise.
One solution would be to turn on or off VAD/DTX in the downlink direction, either for the entire duration of the call or just during the alerting phase. However, disabling VAD/DTX for all calls in a network will lead to increased radio interference.
Enabling downlink VAD/DTX makes it possible for the operator to optimize radio planning (e.g. fewer radio base stations may be needed) and it is therefore advantageous to use DTX in the downlink direction. For an optimized radio capacity and for correctly recognizing music signals in a call, the best solution would be to disable VAD/DTX only when music is played towards the mobile terminal.