Field of the Invention
Embodiments of the present disclosure relate to a method and apparatus for processing an audible signal to form a processed audible signal that has an improved signal-to-noise ratio.
Description of the Related Art
The popularity and reliance on electronic devices has increased dramatically in the past decade. The popularity of electronic devices, such as smart phones, touch pads, PDAs, portable computers, and portable music players, has increased in the past decade. Videotelephony and video conferencing devices have also become more popular in recent years, thanks in large part to proliferation of high speed Internet and price reductions in the supporting equipment. As the number of electronic devices and the reliance on these electronic devices has increased, there has been a desire for these devices to receive and process an audible input signal received from a user so that the audible input can be used to enable some desired task to be performed.
For years there has been a desire to construct machines that can recognize, process and/or transmit various types of audible inputs received from a human being. Although in recent years this goal has begun to be realized, currently available systems have not been able to produce results that are able to accurately detect these received audible inputs in environments where external noise is common or not well controlled. In most conventional microphone containing devices that are configured to recognize and/or process various types of audible inputs, it is often hard for the audible input processing electronics (e.g., voice recognition hardware) to clearly separate the desired human speech from the unwanted noise. This inability to separate audible inputs from the surrounding noise within the environment is primarily due to difficulties that are involved in extracting and identifying the individual sounds that make up the human speech. These difficulties are exacerbated in noisy environments. Simplistically, speech may be considered as a sequence of sounds taken from basic sounds called “phonemes,” produced by a human. One or more phonemes represent a word or a phrase. Thus, extraction of the particular phonemes contained within the received speech is necessary to achieve voice recognition, which is often extremely difficult in noisy environments.
Moreover, conventional voice or speech recognition hardware are typically limited to detecting speech within the lower end of the speech frequency range, such as between about 100 hertz (Hz) and about 3,000 Hz, due to limitations in the device's sampling frequency and the geometry of the microphone assemblies. Thus, a large amount of useful data is lost by these conventional designs since they are not able to detect speech throughout the full speech range which extends between 100 Hz and about 8,000 Hz, and thus lose the information found in the higher end of the speech range found between 3,000 Hz and 8,000 Hz.
As the popularity of voice recognition systems increases, many users utilize them in a variety of environments. Use of these various devices is common in a myriad of moderately noisy to excessively noisy environments such as an office, conference room, airport, or restaurants. Several conventional methods for performing noise reduction already exist, however, many conventional methods can be categorized as types of filtering. In the related art, speech and noise are acquired in the same input channel, where they reside in the same frequency band and may have similar correlation properties. Consequently, filtering will inevitably have an effect on both the speech signal and the background noise signal. Distinguishing between voice and background noise signals is a challenging task. Speech components, which are received by conventional electronic devices, may be perceived as noise components and may be suppressed or filtered along with the noise components. While voice recognition technology is increasingly sophisticated, a clear separation of the voice component of an audio signal from noise components, or in other words having a high signal-to-noise ratio (SNR), is required for acceptable levels of accuracy in the voice recognition or even, in some cases, the delivery and reproduction of the received audio signal at a distant location.
Additionally, as the number of electronic devices and the reliance on these electronic devices has increased, there has been a desire for electronic devices that are untethered to conventional wall outlet types of power sources, thus allowing these untethered electronic devices to be portable. However, the power supply in portable electronic devices is commonly limited by a finite energy storage capacity provided by a battery. The rate of energy consumption by the device determines the time of operation of the device until the battery needs to be recharged or replaced. Therefore, it is desirable to find ways to reduce the power consumption used by the portable device's electronic components, such as voice recognition elements, to improve the battery lifetime of the portable electronic devices.
Therefore, there is a need for an electronic device that solves the problems described above. Moreover, there is a need for a portable electronic device that is able to efficiently filter out unwanted noise from an audible input that is received from an audible source.