This invention relates generally to signal processing, and more particularly to a system for measuring speech content in sound.
A difficult problem for scientists in voice recognition is to electronically differentiate speech-like sound from other sounds. Speech-like sound is sound with a time-frequency description that changes like that of speech sounds. The human brain is very capable of recognizing the difference between speech-like sounds and other sounds. For example, humans can easily differentiate between the sibilant whirring of computer fan noise and a person talking. However, it is extremely complicated to produce electronics which can tell the difference between noises and speech-like sound. One of the complications is that the noises we hear have energy in much the same frequencies as speech-like sound. Another complication is that some vocalizations are not simple talking and therefore may have sound characteristics which are closer to non-speech sound than speech-like sound. Yet another complication is that there are a variety of different speech-like sounds which demonstrate substantially different characteristics. For example, the characteristics of simple talking compared to singing are readily distinguishable to the human ear, yet such differences may confuse systems attempting to electronically differentiate speech-like sound from other sounds.
Thus, there is a need in the art for a system which differentiates speech-like sound from other sounds. The system should be able to characterize vocalizations which are not simple talking. Furthermore, the system should be useful for differentiating a variety of different kinds of speech-like sounds from other sounds.
Upon reading and understanding the present disclosure it is recognized that the inventive subject matter described herein satisfies the foregoing needs in the art and several other needs in the art not expressly noted herein. The following summary is provided to give the reader a summary which is not intended to be exhaustive or limiting and the scope of the invention is provided by the attached claims and the equivalents thereof.
One embodiment of the present invention provides a method and apparatus for a system for measuring speech content in sound. In one embodiment a method is provided for receiving an input signal; extracting a signal related to a time-dependent power of the input signal; determining a time-dependent mean of the signal, M; determining a time-dependent deviation of the signal from the mean, D; and estimating a time-dependent speech-to-noise ratio from M and D. In one embodiment, the extracted signal is an envelope produced using a non-negative function of the input signal. In one embodiment, the estimating includes comparing M and D to a predetermined mapping of a relationship between M and D to speech-to-noise ratio to obtain an estimated speech-to-noise ratio. Embodiments in which the deviation D is the standard deviation are demonstrated. Other deviations are demonstrated. Various analog and digital embodiments are demonstrated herein. Single band and multiple band embodiments are provided. Various filtering systems are demonstrated, including recursive and nonrecursive. Multiple signal extraction methods are demonstrated. Uses of the estimated time-dependent speech-to-noise ratios are demonstrated.
In one embodiment apparatus and process are provided relating to a system receiving an audio signal; converting the audio signal to an electrical signal; converting the electrical signal to a digital representation; bandpass filtering the signal using one or more digital filters to produce a plurality of filtered digital signals; for each digital signal of the plurality of filtered digital signals: extracting an envelope related to a time-dependent power of the signal; determining a time-dependent mean of the envelope, M; determining a time-dependent deviation of the envelope from the mean, D; and estimating a time-dependent speech-to-noise ratio from M and D using a predetermined mapping or relationship between M and D to speech-to-noise ratio; producing a processed digital signal using the plurality of filtered digital signals and their respective estimated time-dependent speech-to-noise ratios; and converting the processed digital signal to a processed analog signal. Alternate embodiments are provided to demonstrate the subject matter of the present patent application. Several applications of the present subject matter are discussed.