The present invention relates to digital processing of acoustic signals and, more particularly, to a method, apparatus and system for determining patterns and improving signal-to-noise ratio in acoustic signals.
Sound is generated by mechanical vibrations, which set up small oscillations of molecules of the physical medium in which the mechanical vibrations occur. The oscillations of molecules alter the distance between adjacent molecules of the medium, thereby also the local pressure therein. Specifically, when the distance between adjacent molecules becomes smaller, the pressure increases (compression) and when this distance becomes larger the pressure decreases (rarefaction). Thus, sound is a pressure wave propagating through the physical medium.
The velocity of sound depends on the density of the medium through which it propagates. Sound waves therefore travel fastest in solids, slower in liquids, slowest in air, and cannot propagate in vacuum. Sound vibrations extend from a few cycles per second (Hz) to millions of Hz. Human healing is limited to a range of between about 20 to 20,000 Hz. Other mammals can hear ultrasound, some such as whales approach 100,000 Hz.
The task of all hearing organs of a mammal is to analyze environmental sounds and transmit the results of that analysis to the brain which interprets the hearing organs output. All sensory organs have specialized sensory cells which convert an environmental signal into a type a neural code in a form of electrical energy transmitted to the brain. In case of acoustic environmental signal, it is the human auditory system which converts the pressure wave of the sound to the neural code.
The auditory system generally includes the external ear canal, the eardrum, the auditory ossicle, the cochlea which includes the inner and outer hair cells, nerves and brain cells. The external ear canal and eardrum are called the outer ear, the eardrum and auditory ossicle are called the middle ear and the cochlea and hair cells are called the inner ear.
The outer portion of the external ear reflects sound towards the ear canal where the pressure waves are aligned so as to strike the ear drum at right angles. The middle ear bones generate a pressure increment and conduct sound from the ear drum to the cochlea present in the inner ear. The pressure increment is necessary to maximize the sound energy that gets to the inner ear. The inner ear serves as a spectrum analyzer which determines the amount of energy contained at the different frequencies that make up a specific sound. The cochlea includes membranes designed to be sensitive to different frequencies at a different locations thereof. Each individual location transmits information to the brain, so that an increase in activity from one location is interpreted as an increased energy at the respective frequency. The human ear thus encodes the frequency information by mapping a spectral representation onto a spatial representation.
Decades of extensive studies of cochlear functions yielded a reasonable understanding of the acoustic-to-neural transduction of the inner ear. In recent years a significant progress has been made in understanding the contribution of the mammalian cochlear outer hair cells to the normal auditory signal processing. The outer hair cells act as local amplifiers, which are metabolically activated. The motion of the outer hair cells is believed to dynamically change the basilar membrane mechanical response [to this end see, e.g., Dallos P., “Outer hair cells: The inside story,” Ann Otol. Laryngol., 1997, 106, 16-22]. Some psychoacoustical properties as suppression and combination tones on one hand, and otoacoustic emissions phenomena on the other hand are result of the outer hair cells activity.
Models that described the outer hair cell activity assume that that the outer hair cells cilia displacement generates force that act on the basilar membrane. For further details see, e.g., Allen and Neely, “Micromechanical models of the cochlea,” Physics Today, 1992, 40-47; Mountain, D. C. and Hubbard, A. E., “A piezoelectric model for outer hair cell function,” J. Acoust. Soc. Am., 1994, 95: 350-354; Dallos P. and Evans B. N., “High-frequency motility of outer hair-cells and the cochlear amplifier,” Science, 1995, 267: 2006-2009; Rattay F., Gebeshuber and Gitter A. H., “The mammalian auditory hair cell: A simple electric circuit model,” J. Acout. Soc. Am, 1998, 105:1558-1565; Dallos P., “Properties of voltage dependant somatic stiffness of cochlear outer hair cell,” JARO, 2000, 01:064-081; and Spector A. A, “On the mechanoelectrical coupling in the cochlear outer hair cell,” J. Acout. Soc. Am, 2000, 107:1435-1441.
Hearing impairment occurs when any of the functions of the auditory system is diminished and the symptoms, as well as possible treating methods, vary depending on which function is diminished and to what extent. Models based on outer hair cell activity predict normal behavior and degradation in the performances due to outer hair cell loss [see, e.g., Kates J M., “A time domain digital cochlear model,” IEEE Trans. Signal Processing, 1991, 39:2573-2592; Goldstein J. L., “Exploring new principles of cochlear operation: bandpass filtering by the organ of corti and additive amplification in the basilar membrane,” Biophysics of Hair Cell Sensory System, edited by Duiffhuis H. Horst J. W., van Dijk P. and van Netten S. M., 1993, Singapore, World Scietific, 315-322; Carney L. H., “Spatiotemporal encoding of sound level: Models for normal encoding and recruitment of loudness,” Hearing Research, 1994, 76, 31-44; and Heinz et al., “Auditory nerve model for predicting performance limits of normal and impaired listeners,” JARO, 2001, 2:91-96].
Typically, hearing loss depends on the sound level and frequency. For example, a person suffering from hearing impairment may not be able to hear sound below a specific sound level threshold, and have a normal hearing capability above another specific sound level threshold. A person suffering from hearing impairment may also experience a reduced capability of hearing a certain range of frequencies.
Many hearing aid devices and methods have been developed over the years to compensate hearing impairment. Hearing aids are utilized in a variety of auditory situations and must communicate acoustic stimuli to the user that are appropriate for the situation. For example, in street traffic the wearer wants an omni-directional sound perception for perceiving danger but would like to experience a directed sound perception in a conversation with a conversation partner. Moreover, low-noise telephoning should be possible for the hearing aid user with hard-wired, cordless or cellular telephones.
The most simple hearing aid device provides a natural sound perception, when the gain is adjusted to the actual listening situation or sound environment, but would require continuously repeated adjustment of the gain to the actual situation, whereby operation of the hearing aid will become complicated and cumbersome. As a result, hearing aids of this type are frequently not adjusted to an optimum sound perception for the actual listening situation.
Also known are hearing aids having different hearing programs designed for responding to different auditory situations. In these hearing aids the user switches between the different operation modes, according to the particular auditory situations in which he is present. A typical hearing program is the telephone hearing program where the acoustic signals that the microphone of the hearing aid picks up are filtered according to the spectrum of telephone signals in order to suppresses unwanted ambient noises in other spectral ranges. High-quality hearing aid devices usually have a number of microphones that can be interconnected by a specific hearing program in order to achieve a directional effect.
Manual switching of the hearing aid to different operation mode causes discomfort to the user and may also be impossible, for example, hearing aid devices which are located in the external ear or even exclusively in the auditory canal.
Other hearing aids have automatic gain control which provides automatic adaptation to different sound environments and an improved sound perception, in particular at low sound levels. However, the performance of such devices is far from being sufficient. In particular, such automatic gain control devices typically provide a higher amplification of low sound levels, which are known to contain a substantial amount of noise, hence cause a serious discomfort to the user.
Complicated systems and apparati which are capable of improving signal-to-noise ratio have also been developed. These technologies generally include a plurality of sensors which collect information from a plurality of directions. Assuming that the sound wave source (e.g., a speaker) has a well-defined location in space, the apparatus amplifies signals originating from one (or a few) direction and suppress omni-directional signals.
For example, one such apparatus improve the signal-to-noise ratio using remote microphones and receivers to separate the speaker's voice from the background noise.
Another system is disclosed in U.S. Application No. 20030012391. In this system a front microphone receives acoustical signal and generates an analog signal. A rear microphone also receives an acoustical signal and generates an analog signal. The two analog signals are converted into digital domain, and transmitted to a sound processor which selectively modifies the signal characteristics and generates a processed signal. The processed signal is converted by a speaker to an acoustical signal which is directed into the ear canal of the user. A directional processor and a headroom expander optimize the gain applied to the acoustical signals and combine the amplified signals into a directionally-sensitive response.
Being based on a plurality of sufficiently spaced-apart sensors, the above systems and apparati are rather bulky and have to be carried separately by the user, thus causing a discomfort and embarrassment to the user.
Beside hearing aids, development and widespread deployment of digital communication systems have brought increased attention to the role of digital signal processing. Generally, digital processing methods are based on speech enhancement algorithms which improve some perceptual aspects of speech so that it may be better exploited by other processing algorithms, such as algorithms for classifying acoustic signals.
Speech enhancement algorithms have been applied to problems as diverse as correction of reverberation, pitch modification, rate modification, reconstruction of lost speech packets in digital networks, correction of speech produced by deep-sea divers breathing a helium-oxygen mixture and correction of speech that has been distorted due to pathological problems of the speaker. Algorithms for classifying acoustic signals are used in many applications. For example, acoustic pattern recognition devices, such as speech recognition devices, are embedded in electronic cards, which are designed to receive spoken commands or for identification.
Noise reduction, however, is probably the most important and most frequently encountered problem in speech enhancement and pattern recognition.
Generally there are two types of speech recognizers. A first type performs certain operations when the user gives short commands, and a second type accepts dictated speech and converts the speech as text, which is displayed on a display device.
Most speech recognizers must be trained by the user before they can recognize words or phrases spoken by the user. The speech recognizer must be trained by the user's voice before the recognizer can interpret user words and commands. Training a speech recognizer requires a user to speak certain words or phrases into the recognizer, usually many times, so that the speech recognizer can recognize the user's speech pattern. Later when the user is using the speech recognizer, the speech recognizer will compare the input voice signal with various stored speech templates to find a template that most resembles the input voice signal.
A user will generally “train” a speech recognizer in an environment that has relatively low interfering noise. Subsequently, most speech recognizers must be used in environments of low interfering noise. Otherwise, the speech recognizer will not be able to separate spoken words from background noise. Where speech recognizers are used in low noise environments, a fairly high rate of recognition is achieved. If the speech recognizer is trained in a location having a moderate, constant background noise, and subsequently used in an environment that has the same moderate, constant background noise, a high recognition rate is achieved.
However, when these speech recognizers are used in high noise environments with negative signal-to-noise ratios and environments where the noise present is different than the background noise present in the training session, the recognition rate falls to very low, unusable accuracy levels. Conventional speech recognizers attempt to estimate the characteristics of the surrounding noise and then determine the effects on the user's voice. Various techniques are incorporated to build statistical or parametric models of the noise which are subtracted from the sound signal. These models are very inaccurate and produce a low-quality output signal.
Moreover, even in those cases where a prior art device has a certain level of success in improving the signal-to-noise ratio of a noisy input, such device fails to preserve the characteristics of a clean input signal. In other words, when the acoustic signal is produced in quiet or low-noise background, prior art devices tend to distort the signal rather then improve it.
The above problems contrast sharply with nature's outstanding acoustic processing capability.
There is thus a widely recognized need for, and it would be highly advantageous to have a method device and apparatus for processing acoustic signals, which are based on physiological principles and devoid of the above limitations.