This invention relates to signal detecting arrangements and, more particularly, to arrangements for detecting speech activity in the presence of noise.
Speech detection arrangements are useful in a variety of communication systems in which speech transmission paths are established in response to the occurrence of speech signal activity. Some examples of the use of speech detection include speech interpolation and echo suppression break-in purposes. The signal measurement criteria or speech definition of conventional speech detectors is sufficiently deficient that it has been necessary to extend the speech indication beyond each signal satisfaction of the speech definition by a rather long hangover interval (100 milliseconds or more) to provide the perception of quality transmission over speech interpolation systems.
Speech detectors, particularly those used for speech interpolation, should ideally define the minimum time intervals during which a transmission path is needed by a customer, expressed in a percentage of total time called activity, so that the listener will perceive the connection as having a "good" quality. Hence, a speech detector should be highly sensitive to the presence of speech signals while at the same time remaining insensitive to non-speech signals. This may be achieved by an improved speech definition which enables the hangover interval to be minimized without degrading performance. Consequently, the poor speech definition of known arrangements for detecting speech have generally suffered from such limitations as undesirable speech clipping on the one hand and excessive activity on the other hand due to utilization of a poor speech definition coupled with excessive hangover and due to noise sensitivity.
A primary object of this invention is to provide an improved speech definition which allows a substantial reduction in the duration of hangover without producing the aforementioned drawbacks.
Another object of the present invention is to provide an improved method and arrangement for detecting speech activity in the presence of noise wherein noise level estimates are independently derived while talker volume estimates are derived in relationship to the noise level estimates.
A related object is to provide a signal classification process using an average representation of several signal samples wherein the signal classification process assigns appropriate time constants to signal measures of the representation while identifying portions which constitute speech and which constitute noise.