The present invention relates to a type of digital voice switch which is generally used in voice communication channels to detect speech in the presence of noise. In particular, the present invention relates to a digital voice switch which employs a speech detector having a variable speech threshold level, a noise detector having a variable noise threshold level, a disabling detector having a fixed maximum threshold level and a threshold adjustment circuitry which provides rapid adjustment of the speech and noise threshold levels.
Voice switches are known in the art as devices which distinguish between vocal sounds and noise carried by a communications channel. Devices of this nature have a number of known uses. For example, in a communication system which includes n voice input channels and m voice output channels, where m&lt;n, voice switches are used to determine when there are vocal sounds on any of the n input channels. Only those channels carrying vocal sounds at any instant are connected to an output channel. Clearly, the acceptable performance of the communication system depends upon the ability of the voice switches to recognize speech in the presence of noise and to establish and maintain a communications link between the input and output channels. A failure to detect speech signals may result in excessively long clipping of speech utterances and cause user dissatisfaction. Another important function of voice switches is to prevent noise signals from activating the communication channel during the silence intervals in speech so that optimum system loading may be achieved.
Previously known voice switches use various techniques to distinguish between noise and speech signals. The earliest and simplest prior art voice switches employ a detector having a fixed threshold level to compare digitally encoded samples of a signal on a channel with the fixed threshold level. If the samples of the signal are above the threshold level, it is assumed the signal represents voice. If the samples of the signal are equal to or below the threshold level, it is assumed that the signal represents noise. Typically, the voice detector detects speech by detecting a given number of consecutive samples in excess of the threshold value. Detection of four samples in sucession has been considered suitable.
Many vocal sounds result in a signal having an amplitude which tapers off toward the end of the sound. Should the amplitude fall below the threshold level, the described voice switch would be turned off before the completion of the sound and result in a clipped speech pattern. To prevent clipping of the trailing portion of transmitted sounds, the voice switch would be constructed to operate with a hangover time. For example, when speech is detected, the voice switch is turned on to pass the detected samples of the channel signal. Once turned on, the voice switch will remain on for a hangover period to insure passage of all samples of the sound. Typically, the prior art voice switches have a hangover time of 150 milliseconds.
Clipping of the front end of the speech segment may also occur because in certain vocal sounds the amplitude of the leading portion of the signal is low. To avoid front end clipping, all samples of the signal are delayed a fixed period of time, say 4 milliseconds, after the samples are received at the input of the voice switch to permit ample time for the detection of speech. After the delayed period, the samples are applied to the output of the voice switch which actually controls the passage of speech samples and the blockage of noise and other non-speech samples. Consequently, the voice switch would detect speech prior to the time the leading portion of the speech signal arrives at the output. Thus, clipping of the front end of the speech signal is minimized.
The described prior art threshold voice switches have many disadvantages. For example, because the amplitude of speech signals varies from speaker to speaker, the prior art voice switches cannot accurately distinguish the speech of low level talkers from channel noise. Moreover, the prior art switches may clip speech if the amplitude of the low level speech signals falls below the fixed threshold. The value of the threshold usually is set at a level which is a compromise between a high level, yielding minimum noise triggering, and a low level, yielding maximum speech detection. Another disadvantage exists because noise on a typical communication channel also varies over a considerable range and a high noise level could trigger the voice switch during the silence intervals in speech. The transmission of noise will use available channel capacity and increase system loading.
To overcome the shortcomings of the fixed threshold systems, voice switches having a variable threshold level have been introduced which adjust the threshold level to the correct level that yields maximum noise immunity and maximum sensitivity to speech. One such system is disclosed in U.S. Pat. No. 3,832,491 filed Aug. 27, 1974, issued to Joseph A. Sciulli et al. and assigned to the assignee of the present application. The invention discloses a voice switch having a digital adaptive threshold generating device. The threshold level is varied in accordance with the loudness of the talker by comparing the number of times the threshold is exceeded over a given period with a reference number. Maximum and minimum threshold levels are also provided to prevent the threshold level from rising too high when there is continuous talking by a loud talker and from falling too low when there is continuous silence.
Another type of prior art voice switches having a variable threshold is taught in the U.S. Patent application Ser. No. 606,828, filed Aug. 21, 1975, filed by Raymond H. Lanier and assigned to the assignee of the present invention. In the application of Lanier the threshold is shifted in response to changes in the noise level itself. This invention is based upon the recognition that over a given interval of time "T" speech will appear as random talk spurts separated by periods of silence, while noise (generally Gaussian distributed) will be continuous. This difference between speech and noise makes it possible to detect the noise level with respect to the voice switch threshold. To detect noise, a time interval T is divided in equal subintervals .tau.. The number of samples that exceed the threshold in each subinterval is then counted. If the values of samples tend to be non-uniform over the interval T, then it is assumed that active speech is present. If, on the other hand, the values of samples tend to be uniform over the time interval T, then it is assumed that noise is present. In the latter case, when the number of samples accumulated during .tau. is large, the threshold level would be raised, whereas when the number of samples accumulated is small, the threshold level would be lowered. To maintain the threshold level just above the noise level, a threshold zone is provided wherein the zone is varied to cause the peak of the noise level to be above a minimum level of the zone but below a maximum level of the zone.
In the prior art variable threshold voice switches described above, the adjustment time initially required to increase or decrease the threshold level, and subsequently to vary the threshold level in response to a change in noise level, is relatively slow. The delay in system response resulting from these adjustments results in unsatisfactory switch performance. Another problem with the described systems is that the voice threshold level, when adjusted to uniform noise samples, is positioned too close to the noise level. Consequently, high noise pulses which are present in normal telephone line noise, quite often exceed the voice threshold level and cause false triggering of the voice switch.