Acoustic signal coding and decoding, especially for data compression and noise reduction, and particularly with respect to the electronic transmission of speech signals, have been of much interest to inventors. Some recent inventions encode frequency and phase information as a function of time. An example is McAuley, et al., U.S. Pat. No. 4,885,790, issued Dec. 5, 1989. In general such systems encode too much information for optimal data compression.
Some innovators have endeavored to use knowledge of physiological processes as a guide to design of acoustic devices. Modeling the vocal tract has produced approaches, for example, a type of system known as CELP. In particular, Bertrand, U.S. Pat. No. 5,150,410, issued Sep. 22, 1992, discloses a voice coding system for encryption of remote conference voice signals which uses the code excited linear predictive speech processing algorithm (CELP) as the basis for analyzing and then reconstructing voice signals. Linear predictive methods prior to CELP often produced reconstructed speech which sounded unnatural or disturbed. See Atal et al., U.S. Pat. No. Re 32,580, reissued Jan. 19, 1988. On the other hand, personal observation suggests that CELP-10, for example, does not always deal well with signals superimposed with high levels of noise. Moreover, a major drawback of the CELP approach is that it requires a burdensome degree of "bookkeeping" calculations, even with recent progress due to Baras and Kao. In addition, since CELP is tied to the vocal tract conceptually, it has severe limitations for processing signals other than speech.
Recently the cochlear system has also drawn attention as a possible guide for new methods of handling audible signals. For example, Van Compernolle, U.S. Pat. No. 4,648,403, issued Mar. 10, 1987, discloses a system for stimulating the cochlear nerve endings in a hearing prosthesis using a deconvolution technique. Seligman, et al., U.S. Pat. No. 5,095,904, issued Mar. 17, 1992, discloses a prosthetic method of stimulating the auditory nerve fiber in profoundly deaf persons with several different pulsate signals representing energy in different acoustic energy bands to convey speech information. Allen et al., U.S. Pat. No. 4,905,285, issued Feb. 27, 1990, discloses signal processing based on analysis of auditory neural firing patterns. These inventions, however, do not exploit biophysical modeling of auditory physiological processes as a tool in signal processing.
Understanding and modeling of the processing of audible signals in the human, and more generally in the mammalian, auditory system have progressed significantly in the last decade. Application of this new knowledge to design of signal processing systems for audible signals, however, is in its infancy.
In the human auditory system an incoming acoustic signal produces a pattern of transverse displacements on the basilar membrane, which responds to frequencies between about 200 and about 20,000 Hz. Displacements for high frequencies occur at the basal end of the membrane and those for low frequencies occur at the wider apical end. In general an incoming signal causes a traveling wave of transverse displacements on the basilar membrane. The position of a particular displacement along the centerline of the membrane is functionally equivalent to a parameter called "scale" which we use in this invention.
Recent research especially Yang, Wang, Shamma, has shown that the cochlear response to these traveling waves can be modeled effectively as the response of a parallel bank of linear time-invariant acoustic filters. Generally the filters must have an amplitude of appropriate shape in the frequency domain, namely peaked asymmetrically around a characteristic frequency with band width increasing with frequency. E.g., Yang, Wang, Shamma; S. A. Shamma, R. Chadwick, J. Wilbur, J. Rinzel, and K. Moorish, "A Biophysical Model of Cochlear Processing: Intensity Dependence of Pure Tone Responses," J. Acoustical Society of America, 80:133-145 (1986). Fundamental considerations also suggest that the filters be causal, that is, not incorporate future information into present signals or predict future signals from past information. As we elaborate in the discussion of our invention, causality imposes constraints on the phase of the filters.
If the individual filter transform functions have an appropriate shape relationship, the filters will be related by a simple wavelet dilation of a basic filter impulse function which is the basis of a wavelet representation Charles K. Chui, An Introduction To Wavelets. (Academic Press 1992) [cited below as "Chui"]. EQU D.sub.S g(t)=s.sup.178 g(st) (1)
where s is the scale parameter and g is the impulse response whose Fourier transform g is the filter transfer function.
Shamma and coworkers in Yang, Wang, Shamma showed that the cochlear filter bank can be approximately modeled as a wavelet transform where the scale parameter is in one to one correspondence with location along the basilar membrane. Since we know that the number of nerve channels in the auditory system is finite, the number of equivalent cochlear filters in the filter bank is also finite, with the set of characteristic scales being denoted as the finite set {S.sub.m }, where the notation {} denotes a "set" of numbers.
The filter characteristic scales are typically exponentially related to a tuning parameter a.sub.o, that is, S.sub.m =(a.sub.o).sup.m.
The precise shape of the amplitude of the filter transfer function is critical for the effectiveness of auditory modeling. Investigation of the mammalian cochlea teaches that equivalent cochlear filters must have sharply asymmetrical filter transform function amplitude in the frequency domain, a shape often referred to as a "shark-fin" shape. R. R. Pfeiffer and D. O. Kim, "Cochlear Nerve Fiber Responses: Distribution Along the Cochlear Partition," J. Acoustical Society of America, 58:867-869 (1975). In particular, the rate of decay (roll-off) of the filter transfer function with respect to distance from its characteristic frequency must be very much higher on the high frequency side than on the low frequency side. The high frequency edges of the cochlear filters act as abrupt "scale delimiters." A pure sinusoidal tone stimulus creates a traveling wave response in the basilar membrane which dies out rapidly above a maximum scale. The filter bank equivalent is that the pure tone produces a response of each filter up to the appropriate scale and an abruptly diminishing response beyond that scale.
In a wavelet representation we identify the traveling wave displacements W on the basilar membrane due to an incoming acoustic signal f(t) with the wavelet transform W.sub.g f(t,S.sub.m).ident.f(t)*D.sub.S.sbsb.m g(t), where g is the basic impulse, response (g, the Fourier transform of the impluse response, is referred to as the filter transfer function),"*" is convolution with respect to time, the s.sub.m 's are the finite number of scales characteristic of the specific filter bank, and {D.sub.s.sbsb.m g} is the finite set of cochlear filter bank impulse responses. The entire filter bank produces a wavelet transform of the incoming signal f.
The auditory nervous system does not receive the physiological equivalent of a wavelet transform directly, but rather transmits a substantially modified version of such a transform. It is known that in the next step of the auditory process, the equivalent of the output of each cochlear filter is transmitted by the velocity coupling between the cochlear membrane and the cilia of the hair cell transducers that initiate the electrical nervous activity by a shearing action on the tectorial membrane. Through this process the mechanical motion of the basilar membrane is converted to a receptor potential in the inner hair cells. A time derivative of the wavelet transform, ##EQU1## models the velocity coupling well. (Ref. 1.) The extrema of the wavelet transform W occur at the zero-crossings of the new function ##EQU2##
In the next step in the auditory process, the threshold and saturation that occur in the hair cell channels and the leakage of electrical current through the membranes of these cells modify the output signal. It is also known to model these two phenomena by applying an instantaneous sigmoidal non-linearity, which can be of the form ##EQU3## to the coupled signal followed by a low-pass filter with impulse response h. At this point, the model of the cochlear output C.sub.h,R (t,s) can be written as ##EQU4## where "*" is again convolution with respect to time.
The human auditory nerve patterns produced by the cochlear output are then processed by the brain in ways that are incompletely understood. One processing model which has been studied with a view toward extracting the spectral pattern of the acoustic stimulus is the lateral inhibitory network (LIN). I. Morishita and A. Yajima, "Analysis and Simulation of Networks of Mutually Inhibiting Neurons," Kybernetik, 11:154-165 (1972). Scientifically LIN reasonably reflects proximate frequency channel behavior and is analytically tractable. The simplest model of LIN is as a partial derivative of the primitive cochlear output with respect to scale: ##EQU5##
Prior work involving creation of such representations of acoustic signals and reconstruction of the original signal from the representation, such as that found in Ref. 1, achieved useful and interesting results. However, this work, e.g., Ref. 1, used generic methods, such as reconstruction by the method of alternating projections, a staple in many engineering applications, e.g., S. Mallat and S. Zhong, "Wavelet Transform Maxima and Multiscale Edges," in M. B. Ruskai, et al. (editors), Wavelets and Their Applications (Jones and Bartlett, Boston, 1992) not specifically tailored for acoustic processing. It also did not encompass data compression other than that inherent in the wavelet representation itself and did not produce any known noise reduction results.
The current invention is directed to an improvement to this general approach which will enable the method and apparatus based on it to be used specifically for data compression and noise reduction in real time and near real time acoustic applications, for example, voice telephony. Specifically, this invention is a method of and apparatus for encoding audible signals with wavelet transforms in such a manner that an irregular sampling method of reconstruction back to the original signal is known to approximate the original signal with accuracy increasing exponentially with each iteration of the method. Empirically the method converges so rapidly that for many purposes the first reconstruction with no iterations is adequate. This invention is further directed to constructing an irregular sampling method of decoding accurately a wavelet transform representation using a substantially reduced sample of a full wavelet representation obtained by truncation, thereby enabling significant data compression. The invention is further directed to selection of partial representations for transmission and reproduction of signals representing audible sounds, especially speech, which, while retaining significant data compression, achieve a high degree of noise reduction which can be optimized by sacrificing some compression. Finally, the invention is directed to a method of reconstruction of wavelet representations of acoustic signals based on the theory of irregular sampling such that the method produces high quality reconstructions of acoustic signals with a very small number of iterations of the method.