The present invention relates generally to the removal of a noise or an unwanted signal portion from an input audio signal. More particularly, this invention pertains to the removal of the noise portion of the sound of the spoken letter xe2x80x9csxe2x80x9d in the English language for use in amplifiers, musical instruments, and the like.
A typical problem for an audio or acoustic sound system is the high pitched screech associated with signal feedback. For an example, consider a person speaking at a microphone to an audience through an amplification system. The microphone picks up the person""s speech and transforms the acoustic waves into an analog audio signal. This analog audio signal is then transmitted to an amplifier and sent to the speaker system. When a high amplitude, high frequency signal is sent through the speakers, this signal is picked up by the microphone and then transmitted through the amplifier and back to the speakers. This circular pattern continues and the resulting sound is the high pitched screech normally associated with feedback. This feedback loop can be initiated by the xe2x80x9cessxe2x80x9d sound in spoken languages. This xe2x80x9cessxe2x80x9d sound is also known as a sibilant.
The prior art teaches that speech sounds can be organized into three distinct classes, voiced sounds, fricative sounds, and plosive sounds. This classification is based on the mode of excitation. Forming a constriction at some point in the vocal tract, and forcing the air through the constriction at a high enough velocity to produce turbulence creates unvoiced fricatives.
Unvoiced fricatives are generally high frequency in nature. Included in this class of speech sounds are sibilants. Sibilants are commonly known as the xe2x80x9cessxe2x80x9d sound. Sibilants are primarily composed of high frequency components with a sharp amplitude rise above 1 kHz. The majority of energy is housed in the 4 kHz to 10 kHz region.
The high frequency high amplitude nature of sibilants can often cause significant problems in audio equipment. Problems occur in all fields of audio engineering including live sound, recording, and broadcast. Specific problems include amplifier clipping and over-modulation in FM sound transmission.
Past methods to solve problems caused by sibilants have include compression and equalization (EQ). These methods are suitable for limited applications, but if these solutions are not selectively used they can cause unnecessary processing of the audio signals.
A example of these past solution to problems brought about by sibilants is to use frequency dependent compression, or what is commonly known as a de-esser. Most de-essers consist of a compressor with a side chained equalizer (EQ), setup so that any sounds in the sibilant frequency range cause the compression to occur. These processors are generally effective, but they also compress other signals, such as cymbals, that occur in the sibilant frequency range detected by the EQ.
In past research, a detection filter has been used to first detect sibilants before any dynamic processing occurs. These prior art algorithms for detection have either been hardware based, or too computationally difficult to perform in real time.
This invention presents a digital adaptive technique for detecting and removing sibilants in real-time processing. This invention provides a digital algorithm for detecting the undesirable sibilants signal, and limiting the modification of the input signal to the undesired signal portion. Thus, the invention teaches how to use both detection and estimation filters to recognize and filter the unwanted signals.
The present invention teaches a method and apparatus for the real-time creation of a clean-output audio signal from an input signal with an unwanted signal or noise portion. The system detects the unwanted portion of the input signal by utilizing a high resolution adaptive detection filter and reduces the unwanted portion of the input signal. The reduction of the unwanted portion is performed by compression of the unwanted signal, subtraction of the unwanted portion of the signal, or eliminating the output signal until the unwanted portion is no longer detected. The system is specifically designed to find a high frequency and high amplitude sound such as a sibilant.
In one embodiment of the invention, the unwanted signal portion is detected by comparing the input signal to an example of the unwanted portion. This comparison is used to generate a similarity value that is representative of the comparison. If the similarity value exceeds a preset threshold, then the system will output a detection signal. The example may be selected from an unwanted signal database that holds multiple examples that vary according to the different voice parameters or other factors affecting human speech such as age, gender, primary language, and geographic dialect influences.
The comparison is performed using a high resolution detection filter which compares the incoming data stream against a model or example of the unwanted signal portion.
In one embodiment, the system reduces the unwanted signal portion by compressing the limited frequency domain normally associated with the unwanted portion. The signal modification unit performs a frequency compression which selectively covers a frequency domain. The system also allows for a second method for reducing the unwanted portion by filtering the frequency domain of the unwanted portion with an adaptive noise cancellation estimation filter. A third method for reducing the unwanted signal portion is by subtracting a portion estimation from the input signal. These methods may be used for partial or complete removal of the sibilant or unwanted portion from the signal.
In another embodiment, the unwanted signal portion detection apparatus utilizes a computer system for operating a computer program. The program uses an unwanted signal example that is selected from a sibilant database. As an alternative, the unwanted signal example may also be generated using a signal generator by inputting voice characteristics so that the signal generator will create a sibilant example for processing. The unwanted signal example is then used in a signal comparitor where a real time comparison of the unwanted single and the input signal is used to generate a similarity value. The similarity value is representative of the similarity between the unwanted signal portion and the input signal. A threshold detector compares the similarity value against a threshold level, and generates a modification signal when the similarity value exceeds the threshold. The signal modification unit then modifies the input signal when a modification signal is detected.
The sibilant or unwanted signal example may be selected from a database of unwanted signals. The unwanted signal example may be selected based upon known characteristics of the input signal. Thus, the sibilant examples can be representative of the physical characteristics of a multitude of voices. In this manner, the sibilant example may be selected according the voice characteristics of the person creating the input signal.