In digital networks, voice is encoded with a PCM (Pulse Code Modulation) coding system which provides a constant bit rate of 64 Kbits per second. This bit rate corresponds to the sampling of the voice analog signal at the rate of 8,000 times per second, each sample being represented by 8 bits. In order to transport this voice signal across a packet or cell network, individual 8 bits samples are assembled into packets.
Packetized systems exploit the bursty nature of voice and data traffic to multiplex the traffic of several users so that they can share transmission bandwidth and switching resources. Packet header includes the necessary control information allowing a wide range of coding schemes for voice and data and thus easy integration of multimedia traffic (voice, video and data). The `packet networks` considered for the present invention are either packet based such as Frame Relay or cells based such as Asynchronous Transfer Mode (ATM) networks.
One way of saving bandwidth is to reduce the bit rate required from the 64 Kbps standard rate. Adaptive Differential PCM (ADPCM) is a compression algorithm reducing the bit rate to 32 Kbps without measurable loss of quality: it encodes each sample as the difference between it and the last sample, rather than as an absolute amplitude value. Voice compression principle relies on the fact that a voice signal has considerable redundancy; which means that the general characteristics of the next few samples can be predicted from the last few samples. The Global System for Mobile Communications (GSM) standard for European Cellular Telecommunications System is one example of compression algorithm. As well as saving bandwidth, the compression algorithm must be good indeed in term of quality, which means that a listener must not detect the difference between an original analog signal and one that has been encoded and later decoded.
Speech occurs in `talk spurts` and is basically half-duplex (since of the time only one person is talking). Sixty percent of any (one way) voice conversation consists of silence. Another way of saving bandwidth is to see when there is no actual speech and just stop sending voice samples during the gaps. This silence removal function consists in detecting silence and stopping sending of voice packets at the originating end and generating a background noise instead of silence at the terminating end. It is of a better comfort to generate a background noise instead of silence to minimize the discontinuity between the background noise between speech and silence. Careful selection of the noise power is necessary to avoid the problem of the `noise pumping`, an annoying contrast between the background noise during the silence period and the background noise during speech spurts.
Most of the so called comfort noise generators take into account the quality requirement for the choice of the noise power. The GSM standard for digital cellular telecommunications provides imbedded in its compression algorithm a comfort noise generator. The parameters of the comfort noise are estimated on the transmit side and transmitted to the receive side before the radio transmission is cut and at a regular low rate afterwards. This allows the comfort noise to adapt to the changes of the noise on the transmit side. The comfort noise generated is a good quality but this generator can only be used with the GSM coding scheme for compression algorithm: it cannot be used to add a `comfort noise generator` to other voice coding algorithms such as PCM (no compression), ADPCM, SBC, and CELP.
The standard G.764 is a Voice Packetization Protocol of CCITT which has been defined to operate with any voice compression algorithm. In order to play out at the terminating end a background noise, the level of noise is specified in a 4-bit field of the packet header. An additional bit called as more (M) bit is used to distinguish between gaps due to silence and gaps due to missing/discarded cells or packets. This solution provides a comfort noise of good quality, is independent of the compression algorithm but implies a 5 bits overhead on each packet or cell transported. When used with ATM cells, the additional 5 bits for voice header cannot be protected with the ATM Adaptation Layer of type 1 (AAL1). Thus, the Adaptation Layer of type 2 (AAL2) which is intended for variable-rate information or AAL5 need to be used. Now, there are two disadvantages related to the use of AAL2 or AAL5 for ATM, because AAL2 is not yet full defined by international standards and AAL5 implies using a large overhead.
It is an object of the present invention to provide a method and a system for silence removal independent from the voice coding or voice compression algorithms.
Another object of the present invention is to provide a method and a system for silence removal which can be used in ATM networks and is compatible with the ATM adaptation layer of type one (AAL1).
It is another object of the present invention to provide such a method and system wherein the overhead on the packets or cells is minimized, while offering, a good quality in terms of comfort during speech.
The method according to the present invention for transporting a stream of packets between a transmitting side and a receiving side through a communication network, said stream including voice packets corresponding to speech periods and silence packets corresponding to silence periods, comprises: at the transmitting side, detecting the ends of speech periods and transmitting only the voice packets corresponding to the speech periods followed by the silence packets following the speech periods for a first period time, and at the receiving side, reconstituting said stream by interleaving between the voice packets received from the transmitting side, white noise packets corresponding to background noise of the preceding voice packets.
In a first embodiment it comprises:
at the transmitting side: PA1 at the receiving side: PA1 at the transmitting side: PA1 at receiving side:
detecting the end of speech periods, PA2 transmitting the silence packets during the first period of time after the end of speech periods, PA2 calculating at the end of speech periods, a white noise level corresponding to the background noise of at least one preceding packet, PA2 transmitting to the receiving side, a control packet including said white noise level, PA2 receiving the voice, silence and control packets, PA2 reading the white noise level in the control packet to generate the white noise packets which are interleaved between the voice packets received from the transmitting side. PA2 detecting the end of speech periods, PA2 transmitting the silence packets during the first period of time after the end of speech periods, PA2 detecting the end of speech periods in the received packets, PA2 calculating a white noise level from the background noise of at least the last one received packets, and generating the white noise packets to be interleaved between the voice packets which are received from the transmitting side.
In a second embodiment it comprises:
The silence removal function of the invention is independent of the voice coding for ATM networks and with a minimal dependence for non-ATM (Asynchronous Transfer Mode) networks which is limited to the characterization of the control packet versus the ordinary voice samples packets. As required, there is no overhead on each packet transported in the network as with the prior art: for non ATM cells, only one control packet is sent by the transmitting side; for ATM networks, there is no overhead on each packet transported in the network and AAL1 can be used. A packet loss control mechanism is also implemented. Moreover, the processing resources used for the silence removal function of the present invention is very low compared to the those necessary for the compression algorithm.
In the preferred embodiment of the invention the silence removal method is implemented by a Digital Signal Processor. A Voice Activity Detector (VAD) function is used to detect the silence packets of the input voice packets stream. In the telecommunications network access nodes, both the voice compression and the silence removal functions may be implemented in voice dedicated adapters cards which include also the adapter usual routing functions.