In most applications of mobile communication voice was always and is still the most important media component. All speech codecs and the mechanisms around the speech codecs were optimized for voice. Music was not considered important in the design of the mobile communication components.
But since the early days of GSM, music plays a small, but not unimportant role, e.g. in “Music-on-Hold”. Recently, “Customized-Alerting-Tones” and “Musical-ring-back-Tones” are becoming popular services and the perception of music becomes more important.
The current solutions are not satisfying for these services. One important observation of real time telephony is that most of the time only one partner is talking, while the other is listening. The one talking does not pay much attention to what he is hearing, as long as it is not the other partner responding. The voice-feedback is important, but otherwise the background noise is just naturally there, not important.
From this observation the conclusion was drawn to cut off speech pauses and to not transmit them. The hope was to save 50% or more of radio- and network link-capacity on average. A “Voice Activity Detection” (VAD) was developed for the discrimination between speech and pause. Later it turned out that it is very unpleasant for the user when the loudspeaker is totally silent between talk spurts of the other partner. Therefore, the so-called “Comfort Noise” was invented. The terminal receiving the speech signal creates this comfort noise on its own, just on basis of a few “Silence Descriptor” (SID) parameters, transmitted every now and then.
This operation is called “Discontinuous Transmission” (DTX), controlled by the VAD within the Speech Codec at center side and with SID frames to feed the Comfort Noise at receiver side. DTX works satisfying for voice communication and for most of the music signals.
The VAD, however, is not really working well for all music signals. Over time the VADs were improved, but some music signals are still falsely classified as “background noise” and are replaced by Comfort Noise; this is unacceptable, if the goal was to use the music as specific, paid service.
There is not much hope currently that a VAD could ever be designed that works well for all kinds of music.
In FIG. 1 a mobile-to-mobile call is shown in which an originating mobile station oMS is the calling party, the terminating mobile station tMS being the called party. In the embodiment shown in FIG. 1 the different components included in the speech path and signaling path are shown. The signaling path is indicated by reference numeral 10 and relates to the path from the oBTS (Base Station Transceiver) via the BSC (Base Station Controller), the originating Mobile Switching Center (oMSC), the intermediate Switching Center (iMSC) and the terminating Mobile Switching Center (tMSC) to the terminating Mobile Station (tMS). The speech path 20 is transmitted through the corresponding media gateways oMGW, iMGW, and tMGW to the terminating mobile station. In each of these media gateways or in the BSC there could be a speech codec including a VAD potentially destroying a music signal.
A music signal could basically be inserted in any of these media gateways without the VADs getting knowledge of this event. By way of example the music ring back tone is typically inserted in the terminating Media Gateway tMGW and is propagating backwards through iMGW, oMGW and BSC to the originating user using the originating mobile station. Up to now in most systems this drawback of the imperfect VADs was accepted and the VADs were just switched off in the wireline part of the network for the whole duration of the call. The VADs then only work in the mobile stations UMS and TMS.
DTX furthermore works well in the two radio uplinks from the mobile stations oMS to oBTS and tMS to tBTS, respectively, controlled by the VADs in the mobile stations. DTX is also working in the two radio downlinks in mobile-to-mobile calls for all signals coming from a mobile station if end-to-end transcoding free operation is applied. DTX works also on all other links in this example and reduces the load everywhere. In the case of a mobile-to-mobile call the VADs are only active in the mobile stations, resulting in a transcoding free operation helping to save downlink transmission resources.
But for mobile-to-PSTN (Public Switched Telephone Network) calls the VAD in the media gateways and the BSC are statically permanently switched off and so all the signals coming from the PSTN are transmitted downlink to 100% of the time, even if there are speech pauses included. In most networks mobile-to-PSTN calls still represent the majority of the calls. This works of course well for all signals coming from the PSTN, also for music, as intended, as the music was the reason that the VADs were switched off. An embodiment of a mobile-to-PSTN call is shown in FIG. 2, again with the signaling signal path 10 and the speech signaling path 20. As a consequence, the radio signal transmission in the downlink to the mobile station is always switched on. As a consequence, the activity on the downlink radio channel is higher than hoped for when using DTX.
Accordingly, a need exists to provide a possibility to switch on a voice activity detector when voice is transmitted and to switch off a voice activity detector when music is transmitted in order to minimize the radio signal transmission from and to the Base Transceiver Station BTS. This problem has been unsolved for years. One possibility to meet the above-referenced need is to insert inband signaling, such as specific tones or tone sequences before and after the music signal to control the VADs in the path. However, these inband signals are normally audible to the end user and these inband signals are not 100% secure and could lead to misbehavior. Additionally, these inband signals would have to pass through one or several transcoding stages and could therefore become unrecognizable for the VADs. Additionally, it would be necessary to update all the existing VADs.