Real-time communication (RTC) through the personal computer (PC) and other computing devices is fast becoming a powerful means for users to communicate with each other. Speech recognition is also important as a natural interface to the PC environment for dictation, command and control. However, today's PC is generally too noisy for effective speech recognition and real-time communication scenarios that depend on accurate processing of audio, such as voice over IP (VOIP) and PC-telephony integration. The difference between a computer using speech recognition technologies accurately resolving a sound to a single linguistic counterpart and a computer not being able to perceive the difference between multiple linguistic choices for a sound can be small in terms of the corresponding signal to noise ratios. Thus, a small increase in signal to noise ratio can make all of the difference in the world when it comes to consistently accurate resolution of sound, such as speech.
In this regard, noise from the PC and its components interferes with accurate processing of audio input signals by speech recognition technologies employing digital analysis. PC system noise raises the noise floor and concomitantly lowers the signal-to-noise ratio (SNR), which reduces the effective signal for speech recognition and other programs that process audio input.
Consumers who want to capture audio input in today's PC environment must understand enough about acoustics and microphone technology to make an informed purchase of a high-quality microphone and place it in an optimal location. For consumers who are unwilling or unable to do this, today's PC environment is typically too noisy for scenarios that depend on accurate processing of audio, such as speech recognition, VOIP and PC telephony.
Many prior techniques focused solely on the noise reduction algorithm and not the entire system and method for characterization of a particular type of noise source. For instance, from Silverberg et. al. “Feedback Method of Noise Control Having Multiple Inputs and Outputs” (U.S. Pat. No. 5,953,428), mention is made of only computational aspects whereby the feedback system computation is optimized. Furthermore, most existing solutions are aimed at complete hardware fixes that focus on literal damping of the acoustic noise to be eliminated. For reference, see e.g., http://www.quietpc.com/. A seminal work on sound absorption is Olson, H. F. and May E. G. (1953), Electronic Sound Absorber, Journal of the Acoustical Society of America 25, 1130-1136. Many historical aspects of vibration and control can be found in Elliot, S. J. and Nelson, P. A. “Active Control of Sound” as well.
Dolby noise reduction (NR) is an existing noise reduction technique. Dolby NR makes no attempt to remove noise once it has been mixed in with the music. Rather, it prevents noise from being added to music as it is recorded in the first place. Dolby NR utilizes a two-step process that first encodes the music when it is recorded, and then decodes it when the tape is played back. This is why the Dolby noise reduction system in one's recorder/playback system should be switched on both when a cassette is recorded and when the cassette is being played.
In recording, the Dolby NR circuit makes the quiet parts of the music, which are most susceptible to noise, louder than normal. When the encoded tape is played back, the Dolby NR circuit is switched around to lower, in turn, the previously boosted parts of the music. This automatically lowers any noise added to the music by the recording process, and it restores the music to its original form so that nothing is changed or lost but the noise.
While Dolby NR has been utilized effectively for music being recorded on a cassette, Dolby NR has never been applied to the PC environment. Moreover, while Dolby NR works quite positively for white noise having equal distribution across the spectrum of frequencies (a constant hiss, for example, that results from cassette head recording electronics), its use is not specifically optimized for the type of noise being made. Instead, the algorithms of Dolby NR are applied based upon knowledge of how the human ear tends to work, i.e., that a human ear tends to hear in a spectrally related way. The Dolby NR algorithms are thus generalized based upon characteristics of the human ear, and accordingly are not suited to optimizing the signal to noise ratio of a signal input to a PC in view of the type of noise that a PC generates.
In another signal analysis prior art area involving the analysis of noise, snapping shrimp noise is thought to be a major component of ambient noise at high frequencies (2 kHz-300 kHz) in warm shallow water. Accordingly, experiments have been designed to investigate its temporal and spatial distribution and variability, having applications to many underwater acoustic systems in providing background knowledge about the structure of this class of noise. For instance, in Li et al.'s “Estimating Snapping Shrimp Noise in Warm Shallow Water” (1999), three noise models, namely system noise, observational noise and beampattern uncertainty, are used to pattern noise. Their simulation results indicate that their stochastic inversion algorithm is robust to reasonable levels of these types of noise, enabling the imaging of shrimp noise intensity on a seabed over an area of some 350 m2 with a resolution of 3.5 m2 and with a Root Mean Squared (rms) error below 20%, or approximately 0.8 dB.
The analysis of snapping shrimp noise illustrates that the modeling of a particular type of noise (i.e., shrimp noise) according to known mathematical models is known in the art of noise analysis and reduction. To people in search of shrimp beds, the goal may even be to enhance shrimp noise over other ambient aquatic noises. However, the modeling of shrimp noise is hardly applicable to a PC environment for a variety of reasons (other than that computers do not behave well when the internal electronics are shorted). Mainly, the shrimp noise is not coming from within a small substantially enclosed area, the shrimp noise follows a single mathematical model and the signal of interest is not nearby and directed at the small substantially enclosed area for processing.
It is also known in the art to use beamforming. A beamformer is a spatial filter that operates on the output of an array of sensors in order to enhance the amplitude of a coherent wavefront relative to background noise and directional interference. The so-called “pointing direction”0 is called the Maximum Response Angle (MRA), and can be arbitrarily chosen for the beams. The goal of beamforming is to sum multiple elements to achieve a narrower response in a desired direction (the MRA). That way when a sound is heard in a given beam, the direction from which it came is known. Real implementations introduce things such as nulls and sidelobes, which are not discussed herein. However, beamforming also does not tailor its technique to the type of noise characteristic in a PC enclosed environment, and the enclosure itself inhibits the notion of directional interference.
Two types of noise known to affect cell phone operation are hybrid echoes and acoustic echoes. Hybrid echo(s) relate to delay(s) in the electrical path between the microphone and earpiece of speaker and receiver, respectively, and are inherent in designs involving 2-to-4 wire conversion. Acoustic echo is created by the loudspeaker in a phone. The sound comes out of loudspeaker in the phone, bounces off the walls, ceiling and other objects in the room, reflects and comes back to the phone's microphone. It is thus known in the cell phone art to use a technique called echo cancellation to minimize the effects of these communication nuisances. However, these kinds of echoes are not endemic to a PC operating environment.
In short, there is ambient noise (e.g., phone ringing, air conditioning, street noise, planes flying by, people talking, etc.) in a PC environment and there is PC noise (e.g., fans, disks, chassis rattle, etc.). There are a number of mechanisms that already exist for addressing ambient noise in a target environment, such as spatial filtering, beamforming and echo cancellation algorithms, some of which are discussed herein; none of these algorithms, however, adequately address the problem of PC noise affecting the resolution of a recording of an input audio signal due to predominance of noise in the signal to noise ratio.
Accordingly, there is a great need for a mechanism or vehicle to improve the ability of the PC to process audio input accurately, making the PC a much better tool for speech recognition and real-time communication for home and business users. Such a tool would have practical use beyond the boundaries of the PC environment as well, to the extent that such environments mimic the characteristics of the PC noise, such as in some noisy industrial environments. It would be further desirable to recognize the particular kinds of noise that exist in a PC environment, so as to apply algorithms suited to the reduction of PC noise. It would be further desirable to develop mathematical models that efficiently model the type(s) of noise identified in a PC environment, so it may be effectively reduced to improve the signal to noise ratio associated with the input of an audio signal to a PC environment. It would be still further desirable to provide such a mechanism at a low cost to the consumer.