1. Field of the Invention
The present invention relates generally to the field of speech coding and, more particularly, to an embedded silence and noise compression.
2. Related Art
Modern telephony systems use digital speech communication technology. In digital speech communication systems the speech signal is sampled and transmitted as a digital signal, as opposed to analog transmission in the plain old telephone systems (POTS). Examples of digital speech communication systems are the public switched telephone networks (PSTN), the well established cellular networks and the emerging voice over internet protocol (VoIP) networks. Various speech compression (or coding) techniques, such as ITU-T Recommendations G.723.1 or G.729, can be used in digital speech communication systems in order to reduce the bandwidth required for the transmission of the speech signal.
Further bandwidth reduction can be achieved by using a lower bit-rate coding approach for the portions of the speech signal that have no actual speech, such as the silence periods that are present when a person is listening to the other talker and does not speak. The portions of the speech signal that include actual speech are called “active speech,” and the portions of the speech signal that do not contain actual speech are referred to as “inactive speech.” In general, inactive speech signals contain the ambient background noise in the location of the listening person as picked up by the microphone. In very quiet environment this ambient noise will be very low and the inactive speech will be perceived as silence, while in noisy environments, such as in a motor vehicle, inactive speech includes environmental background noise. Usually, the ambient noise conveys very little information and therefore can be coded and transmitted at a very low bit-rate. One approach to low bit-rate coding of ambient noise employs only a parametric representation of the noise signal, such as its energy (level) and spectral content.
Another common approach for bandwidth reduction, which makes use of the stationary nature of the background noise, is sending only intermittent updates of the background noise parameters, instead of continuous updates.
Bandwidth reduction can also be implemented in the network if the transmitted bitstream has an embedded structure. An embedded structure implies that the bitstream includes a core and enhancement layers. The speech can be decoded and synthesized using only the core bits while using the enhancement layers bits improves the decoded speech quality. For example, ITU-T Recommendation G.729.1, entitled “G.729-based embedded variable bit-rate coder: An 8-32 kbits scalable wideband coder bitstream interoperable with G.729,” dated May 2006, which is hereby incorporated by reference in its entirety, uses a core narrowband layer and several narrowband and wideband enhancement layers.
The traffic congestion in networks that handle very large number of speech channels depends on the average bit rate used by each codec rather than the maximal rate used by each codec. For example, assume a speech codec that operates at a maximal bit rate of 32 Kbps but at an average bit rate of 16 Kbps. A network with a bandwidth of 1600 Kbps can handle about 100 voice channels, since on average all 100 channels will use only 100*16 Kbps=1600 Kbps. Obviously, in small probability, the overall required bit rate for the transmission of all channels might exceed 1600 Kbps, but if that codec also employs an embedded structure the network can easily resolve this problem by dropping some of the embedded layers of a number of channels. Of course, if the planning/operation of the network is based on the maximal bit rate of each channel, without taking into account the average bit rate and the embedded structure, the network will be able to handle only 50 channels.