This invention in general relates to a method for reducing the total bandwidth requirement for voice-enabled applications over the Internet and specifically relates to a method of separating speech signal from non-speech signal.
Given the rapid growth in Internet traffic, there is a shortage in the bandwidth available for the transfer of data for voice over IP applications. Speech signals consist of non-speech segments and speech segments. Non-speech segments do not contribute to comprehension and may contain noise or disturbances which are undesirable, and may cause deterioration. However, all segments, speech or otherwise, demand bandwidth for transmission. Moreover, in the context of speech recognition, segmentation of the input speech stream into “speech” and “non-speech” is the precursor to applying recognition algorithms.
Bandwidth optimization is achieved by speech compression using low bit rate codecs integrated with Voice Activity Detection (VAD). Further optimization is usually achieved by the following two methods. In the first method, VAD scheme, usually based on energy and zero-crossing methods, is embedded in codecs. Examples are G.729, Global System for Mobile communication (GSM), Adaptive Multi Rate (AMR), G.722 and 3rd Generation Partnership Project (3GPP). In the second method, VAD scheme may not be embedded in the codec block. Selecting talk spurts and avoiding codec processing of non-speech segments at the transmitter has the additional advantage of reducing the computational load on the codec itself. This is particularly significant as the number of streams grows. In such a setup, VAD coding is independent of the speech code. Portability across codecs is an added advantage since one can use any codec after applying a stand-alone VAD that removes the non-speech part of the stream.
There is an unmet market need for a method and a system that effectively removes the non-speech component in a voice over internet protocol (VoIP) based communication system.