Various communication devices such as a cell phone, a mobile phone, a Personal Desktop Assistant (PDA) or a wireless telephone may be used for communication over telecommunication network or the Internet. The communication devices may be used at home, office, inside a car, train, airport, beach, restaurants and bars, street, and almost any other venue that may have variable levels of environmental noise. The environmental noise may be picked up from a microphone of a communication device and may degrade quality of speech signals transmitted or received at the communication device. As a result, in an ongoing call the speech of a caller may be unintelligible to a receiver. Further, the communication device may use more bandwidth or network capacity when there is noise in environment, especially during non-speech segments in a two-way conversation when a user is not speaking. Consequently, noise reduction and improvement in Signal-to-Noise Ratio (SNR) may be performed prior to transmitting the signals from the communication device.
Pitch of a signal such as speech signal is an acoustic parameter for speech recognition, compression, and synthesis. The pitch plays a significant role in both production and perception of the speech. Generally, the pitch is perceived with great accuracy at a fundamental frequency that characterizes the vibrations of speaker's vocal chords. The speech signal is a quasi-periodic or a virtually periodic signal. Therefore, harmonic components of the speech signal are present at integer multiples of the fundamental frequency.
Various techniques for noise reduction employ Pitch Detection Algorithm (PDA) to estimate the pitch or the fundamental frequency of the speech signal. PDA may be used in the time domain to estimate the period of the quasi-periodic signal, and then invert that value to generate the frequency of the signal. One approach for pitch estimation may be to measure the distance between zero crossing points of the signal (i.e. the Zero Crossing Rate). However, this technique may not be effective in case of complex waveforms including multiple sine waves with differing periods. However, zero-crossing techniques may be in some cases, for example in speech applications where a single source of sound is considered. This technique is simple and inexpensive, however, it may be inaccurate and generate noisy signals.
Further, PDA may be used in frequency domain for polyphonic detection. The Fast Fourier Transform (FFT) may be used to convert the signal to a frequency spectrum. Various frequency domain algorithms include the harmonic product spectrum, cepstral analysis, or maximum likelihood which attempt to match the frequency domain characteristics of the signal to pre-defined frequency maps. The FFT algorithm is efficient and can be applied in various scenarios. However, processing power required increases with the desired accuracy of the signal. The frequency domain based PDA may be less expensive, resistant to noise, and adjustable to different kind of inputs as compared to time domain based analysis. However, in this case, low pitches may be tracked less accurately than high pitches.
Pitch of a signal is a perceptive parameter and not a physical parameter. For a single sinusoid, below mentioned Equation 1 defines the relation between the frequency ‘F’ and the pitch ‘P’ of the signal in the harmonic scale:
                              P          ⁡                      (            F            )                          =                              P            ref                    +                      O            ⁢                                                  ⁢                                          log                2                            ⁡                              (                                  F                                      F                    ref                                                  )                                                                        Equation        ⁢                                  ⁢        1            where ‘Pref’ and ‘Fref’ are the pitch and the corresponding frequency respectively of a tone of reference. The constant ‘O’ is the division of the octave. For example, a value of O as 12 leads to the classic dodecaphonic musical scale. This technique is computationally inexpensive, reasonably resistant to noise, adjustable to different kind of inputs. However, low pitches may be tracked less accurately than high pitches.
Various techniques are available for noise reduction. In case of multi-microphone techniques, more than two microphones results in effective noise reduction. However, the communication devices pose spatial restrictions on use of multiple microphones. Further, under a stationary noise environment such as fan or motor noise, a spectral subtraction method may be utilized for the noise reduction. In this technique, noise spectrum to be subtracted is obtained during non-speech activity. Therefore, non-stationary noise may not be removed. In monaural approach, the noise reduction is based on discrimination between properties of the voice and the noise. The spectrum of voiced sounds include harmonic components that are integer multiples of the fundamental frequency. An existing technology such as comb filter method may be used for the noise reduction. However, in case of comb filter method, a detection error in the fundamental frequency may degrade the quality of the filtered voice.
A true fundamental frequency of the signal may be determined from several possible frequencies using time continuity. Another existing technique uses time continuity property of both power spectrum envelopes (PSE) and the fundamental frequency to estimate the true fundamental frequency. Further, the reliable fundamental frequency may be determined by using continuity of power spectrum envelopes due to quasi stationary characteristics of the human voice. However, the fundamental frequency extracted from the noisy signal may include fluctuations because of noise interference. Therefore, the fundamental frequency is adopted from both the latest frequency and the predicted frequency so as to keep the continuity in the frequency. Moreover, the comb filtering for continuous speech with noise often generates strange sounds because the harmonic structure at higher frequency is disturbed by the noise.
Another existing technique as disclosed in U.S. Pat. No. 6,415,034 uses multiple microphones for noise cancellation. However, noise may leak past an ear capsule of the microphone and enter into a speech microphone. Further, the technique requires complex, power consuming and expensive digital circuitry, which may not be suitable for portable, battery powered devices such as mobile phones.
Another existing technique for reducing noise as disclosed in U.S. Pat. No. 5,969,838 utilizes two fiber optic microphones placed side-by-side to each other. However, the technology uses light guides and other relatively expensive and/or fragile components that may not be suitable for communication devices. Yet another technique as disclosed in U.S. Pat. No. 5,406,622 uses two adaptive filters for noise reduction. One of the adaptive filters is driven by a transmitter of the communication device to subtract speech signal from a reference value to produce an enhanced reference signal. Another adaptive filter is driven by the enhanced reference signal to subtract noise from a transmitter of the communication device. However, the technique requires accurate detection of speech and non-speech regions in the speech signal. Therefore, an incorrect detection of the speech and the non-speech region may degrade the performance of noise reduction.
Another technique for noise cancellation includes passive expander circuits that are used in the electret-type telephonic microphone. However, only low level noise that occurs during periods when speech is not present may be reduced. Further, passive noise-canceling microphones may be used to reduce the background noise. However, passive noise-canceling microphones have a tendency to attenuate and distort the speech signal when the microphone is not in close proximity to the user's mouth. Moreover, such microphones are effective only in a frequency range up to about 1 kHz.
Active noise-cancellation circuitry may be used to reduce background noise. In this case, a noise-detecting reference microphone and adaptive cancellation circuitry are used to generate a continuous replica of the background noise signal that is subtracted from the total background noise signal. However, this technique may be susceptible to cancellation degradation because of a lack of coherence between the noise signal received by the reference microphone and the noise signal impinging on the transmit microphone. Further, the performance may vary based on the directionality of the noise and may tend to attenuate or distort the speech.
Therefore, techniques for noise reduction of a speech signal at a communication device are desired.