The present invention relates generally to processing telecommunication signals. More particularly, the invention provides a method and apparatus for performing DTMF (i.e., Dual-Tone Multi-Frequency) detection and voice mixing in the CELP (i.e., Code Excited Linear Prediction) domain. Specifically, it relates to a method and apparatus for detecting the presence of DTMF tones in a compressed signal from the CELP parameters, and also for mixing multiple input compressed voice signals, represented by multiple sets of CELP parameters, into a single set of CELP parameters. Merely by way of example, the invention has been applied to voice transcoding, but it would be recognized that the invention may has a much broader range of applicability.
Telecommunications techniques have developed over the years. Recently, there have been a variety of digital voice coders developed to meet certain bandwidth demands of different packet-networks and mobile communication systems. Digital voice coders provide compression of a digitized voice signal as well as reverse transformation functions. Rapid growth in diversity of networks and wireless communication systems generally requires that speech signals be converted between different compression formats. A conventional method for such conversion is to place two voice coders in tandem to serve a single connection. In such a case, the first compressed speech signal is decoded to a digitized signal through the first voice decoder, and the resulting digitized signal is re-encoded to a second compressed speech signal through the second voice encoder. Two voice coders in tandem are commonly referred to as a “tandem coding” approach. The tandem coding approach is to fully decode the compressed signal back to a digitized signal, such as Pulse Code Modulation (PCM) representation, and then re-encode the signal. This often requires a large amount of processing and incurs increased delays. More efficient approaches include technologies called smart transcoding, among others.
In addition to the requirements of voice transcoding among current diverse networks and wireless communication systems, it is also required to provide functionality for advanced feature processing. A specific example of can advanced feature is Dual Tone Multiplexed Frequency (DTMF) signal detection. DTMF signaling is widely used in telephone dialing, voice mail, electronic banking systems, even with Internet Protocol (IP) phones to key in an IP address. In telecommunications speech codecs, the in-band DTMF signals are encoded to a compressed bitstream. Conventional DTMF signal detection is performed in the speech signal space. As merely an example, the Goertzel algorithm with a two-pole Infinite Impulse Response (IIR) type filter is widely used to extract the necessary spectral information from an input digitized signal and to form the basis of DTMF detection.
When DTMF signal detection is required in voice transcoding, a tandem approach is commonly used. In this case, the input compressed speech signal has to be decoded back to the speech domain for DTMF signal detection, and then re-encoded to a compressed format. Since the processing in smart voice transcoding is performed in the CELP parameter space, known DTMF detection methods are often not suitable. Furthermore, known smart voice transcoding methods do not include DTMF signal detection functionality and are therefore limited.
Another specific example of an advanced feature for voice transcoding is the ability to handle multiple input signals. If the input signals are multiple speech signals; the voice mixer simply mixes the speech signals and outputs the mixed speech signal. However, in a packet network or a wireless communication system, the input signals are multiple compressed signals. Furthermore, with the current diversity of packet networks and wireless communication systems, the input signals may be in various compression formats. The conventional voice mixing solution performs mixing of the input packets by decoding the input packets into speech signals, mixing the speech signals, and re-encoding the mixed speech signals into output packets. This requires significant computational complexity to decode and re-encode each input compressed signal.
In an attempt to improve the voice quality produced by voice mixing for packet networks, certain “smart” conference bridging methods have been proposed. Although such method can provide side information and can improve the quality of mixed voice signals, it still uses a tandem approach that involves decoding, mixing in the speech space and re-encoding. This approach is often not suitable for a voice transcoder that operates in the CELP parameter space without going to the speech space.
From the above, it is seen that techniques for improved processing of telecommunication signals are highly desired.