1. Field of the Invention
The present invention concerns the transmission of speech signals.
2. Description of the Prior Art
Detecting the presence of speech has numerous applications, especially in telephony, in relation to concentrators. It is well known that only 30 to 40% of the duration of a telephone conversation is devoted to speech and that one means of increasing the capacity of a multichannel transmission system consists in using these periods of silence on existing channels to create additional channels.
A known method of detecting the presence or absence of speech on a channel is based on the use of a power level criterion according to which any signal which, during an elementary time interval, has a mean power level higher than a predetermined threshold above the mean noise level is considered as constituting a speech signal, the remainder constituting silence. Unfortunately defining this threshold involves a compromise between two mutually conflicting requirements: the threshold must be as high as possible for good speech/silence discrimination and as low as possible to prevent "chopping" of the speech signal, reducing its intelligibility. One known method of effecting this compromise is to modify the power level threshold according to the mean power level for the person speaking, evaluated over a relatively long period, and to systematically extend sequences detected as constituting speech by a "holding interval" the duration of which is inversely proportional to the mean power level for the person speaking.
A major disadvantage of this power criterion is that it is not possible to reduce the power level threshold sufficiently to provide for detecting non-voiced beginnings of words corresponding to fricative and sibilant consonants without seriously compromising the effectiveness of the detector. The result is a certain degradation in message intelligibility. An example concerns the word "seven" which is often transmitted as "even".
Another known method of detecting the presence or absence of speech on a channel is based on criteria associated with passages of the signal through zero level. In the case of a speech signal these passages through zero level contain a high proportion of the information content since the intelligibility of speech signals is only slightly degraded by peak limiting. These "zero crossing" criteria are concerned with the frequency characteristics of the analysed signal sequence, rather than its power level. They consist in assuming that the sequence analysed constitutes speech if the distribution of zero crossings is indicative of a frequency component with maximum amplitude in the top or bottom part of the speechband, as most speech signals show a power spectrum with a peak which is off-center relative to the speech band, towards the lower end for the vowels and certain semi-vowel, nasal and plosive consonants and towards the upper end for fricative and sibilant consonants. The disadvantage of these zero crossing criteria is that their effectiveness is largely dependent on the statistical and frequency distribution properties of the noise content of the signal being analysed.
It is known to compensate for the insensitivity of the power level criterion to non-voiced fricative and sibilant consonants by associating this criteria with a zero crossing criterion, more precisely by lowering the threshold when the zero crossing criterion shows that the peak in the power spectrum of the signal being analysed is near the top end of the speech band. An example of such a speech detector is described in French Patent Specification No 2 158 720. The criterion used in this detector is the exceeding of a value of 16 by the algebraic sum of two marks attributed to consecutive samples of the signal being analysed taken at intervals of 125 microseconds. One of these marks has an integer value between -1 and +2, according to the absolute amplitude of the signal sample, and represents, after summation, the rms value or power level of the signal. The other mark has the value 1 if the sample has an absolute value higher than a predetermined threshold and if it and the two preceding samples are of alternate sign. Otherwise its value is 0. The value of this mark integrated over a number of consecutive signal samples is, in view of the sampling rate, indicative of the proportion of frequencies at the upper end of the speech band in the signal being analysed.
The object of the present invention is to provide a method of detecting the presence of speech in a telephone signal and a speech detector implementing that method based on the use of a power level criterion in conjunction with a zero crossing criterion and of increased effectiveness.