This invention is directed to the perception and recognition of audio signal inputs and, more particularly, to a signal processing method and apparatus for providing a nonlinear frequency analysis of structured signals in a manner which more closely mimics the operation of the human ear and brain.
The use of an array of nonlinear oscillators to process input audio signal is known in the art from U.S. Pat. No. 7,376,562 granted to Edward W. Large (Large).
The human ear has been modeled as a plurality of oscillators tuned to different frequencies. The brain processes these inputs from the oscillators by connecting oscillator pairs as needed to interpret sound inputs. Audio sounds naturally occurring in the world are complex signals, as a result, the developed human ear is a complex processor making use of these connections between oscillators. In nature, connections between oscillators are changing and connection patterns are learned responses to repeated inputs. This results in an increase in synaptic efficiency between presynaptic cells and postsynaptic cells. It is also known from prior art modeling that connection between two oscillators has both a strength (amplitude) and a natural phase.
It is generally known from Large to process signals using networks of nonlinear oscillators. Nonlinear resonance provides a wide variety of behaviors that are not observed in linear resonance (e.g., neural oscillations). Moreover, as in nature, oscillators can be connected into complex networks. FIG. 1 shows a typical architecture used to process acoustic signals. It consists of a network 100 of layers of one-dimensional arrays of nonlinear oscillators, called gradient-frequency nonlinear oscillator networks (GFNNs). In FIG. 1, GFNNs are arranged into processing layers to simulate auditory processing by the cochlea (102) at layer 1 (the input layer), dorsal cochlear nucleus (DCN) (104) at layer 2, and inferior colliculus (106) (ICC) at layer 3. From a physiological point of view, nonlinear resonance models outer hair cell nonlinearities in the cochlea, and phase-locked neural responses on the DCN and ICC. From a signal processing point of view, processing by multiple GFNN layers is not redundant; information is added at every layer due to nonlinearities.
More specifically, as illustrated in FIG. 2, an exemplary nonlinear oscillator system is comprised of a network 402 of nonlinear oscillators 4051, 4052, 4053 . . . 405N. An input stimulus layer 401 can communicate an input signal to the network 402 through a set of the stimulus connections 403. In this regard, the input stimulus layer 401 can include one or more input channels 4061, 4062, 4063 . . . 406C. The input channels can include a single channel of multi-frequency input, two or more channels of multi-frequency input, or multiple channels of single frequency input, as would be provided by a prior frequency analysis. The prior frequency analysis could include a linear method (Fourier transform, wavelet transform, or linear filter bank, methods that are well-known in the art) or another nonlinear network, such as another network of the same type.
Assuming C input channels as shown in FIG. 2, then the stimulus on channel 406C at time t is denoted xC (t), and the matrix of stimulus connections 403 may be analyzed as strength of a connection from an input channel 406C to an oscillator 405N, for a specific resonance, as known from Large. Notably, the connection matrix can be selected so that the strength of one or more of these stimulus connections is equal to zero.
Referring again to FIG. 2, internal network connections 404 determine how each oscillator 405N in the network 402 is connected to the other oscillators 405N. As known from Large, these internal connections may be denoted as a matrix of complex-valued parameters, each describing the strength of the connection from one oscillator 405M to another oscillator 405N, for a specific resonance, as explained next.
As known from Large, signal processing by networks of nonlinear oscillators can be performed to broadly mimic the ear response. This is similar to signal processing by a bank of linear filters, but with the important difference that the processing units are nonlinear, rather than linear oscillators. In this section, this approach is explained by comparing it with linear time-frequency analysis.
A common signal processing operation is frequency decomposition of a complex input signal, for example by a Fourier transform. Often this operation is accomplished via a bank of linear bandpass filters processing an input signal, x(t). For example, a widely used model of the cochlea is a gammatone filter bank (Patterson, et al., 1992). For comparison with our model a generalization can be written as a differential equationż=z(α+iω)+x(t)  (1)
where the overdot denotes differentiation with respect to time (for example, dz/dt), z is a complex-valued state variable, ω, is radian frequency (ω=2πf, f in Hz), α<0 is a linear damping parameter. The term, x(t), denotes linear forcing by a time-varying external signal. Because z is a complex number at every time, t, it can be rewritten in polar coordinates revealing system behavior in terms of amplitude, r, and phase, φ. Resonance in a linear system means that the system oscillates at the frequency of stimulation, with amplitude and phase determined by system parameters. As stimulus frequency, ω0, approaches the oscillator frequency, w, oscillator amplitude, r, increases, providing band-pass filtering behavior.
Recently, nonlinear models of the cochlea have been proposed to simulate the nonlinear responses of outer hair cells. It is important to note that outer hair cells are thought to be responsible for the cochlea's extreme sensitivity to soft sounds, excellent frequency selectivity and amplitude compression (e.g., Egulluz, Ospeck, Choe, Hudspeth, & Magnasco, 2000). Models of nonlinear resonance that explain these properties have been based on the Hopf normal form for nonlinear oscillation, and are generic. Normal form (truncated) models have the formż=z(α+iω+β|z|2)+x(t)+h.o.t.  (2)
Note the surface similarities between this form and the linear oscillator of Equation 1. Again ω is radian frequency, and α is still a linear damping parameter. However in this nonlinear formulation, α becomes a bifurcation parameter which can assume both positive and negative values, as well as α=0. The value α=0 is termed a bifurcation point. β<0 is a nonlinear damping parameter, which prevents amplitude from blowing up when α>0. Again, x(t) denotes linear forcing by an external signal. The term h.o.t. denotes higher-order terms of the nonlinear expansion that are truncated (i.e., ignored) in normal form models. Like linear oscillators, nonlinear oscillators come to resonate with the frequency of an auditory stimulus; consequently, they offer a sort of filtering behavior in that they respond maximally to stimuli near their own frequency. However, there are important differences in that nonlinear models address behaviors that linear ones do not, such as extreme sensitivity to weak signals, amplitude compression and high frequency selectivity. The compressive gammachirp filterbank exhibits similar nonlinear behaviors, to Equation 2, but is formulated within a signal processing framework Wino & Patterson, 2006).
Large taught expanding the higher order terms of Equation 2 to enable coupling among oscillators of different frequencies. This enables efficient computation of gradient frequency networks of nonlinear oscillators, representing an improvement to the technology. As known from applicant's copending application Ser. No. 13,916,713, the canonical model (Equation 3) is related to the normal form (Equation 2; see, e.g., Hoppensteadt & Izhikevich, 1997), but it has properties beyond those of Hopf normal form models because the underlying, more realistic oscillator model is fully expanded, rather than truncated. The complete expansion of higher-order terms produces a model of the form
                                          z            .                    i                =                                            z              i                        ⁡                          (                                                α                  i                                +                                  ⅈ                  ⁢                                                                          ⁢                                      ω                    i                                                  +                                                      (                                                                  β                                                  1                          ⁢                                                                                                          ⁢                          i                                                                    +                                              ⅈ                        ⁢                                                                                                  ⁢                                                  δ                                                      1                            ⁢                                                                                                                  ⁢                            i                                                                                                                )                                    ⁢                                                                                                          z                        i                                                                                    2                                                  +                                                                            (                                                                        β                                                      2                            ⁢                                                                                                                  ⁢                            i                                                                          +                                                  ⅈ                          ⁢                                                                                                          ⁢                                                      δ                                                          2                              ⁢                                                                                                                          ⁢                              i                                                                                                                          )                                        ⁢                    ɛ                    ⁢                                                                                                                    z                          i                                                                                            4                                                                            1                    -                                          ɛ                      ⁢                                                                                                                              z                            i                                                                                                    2                                                                                                        )                                +                                    cP              ⁡                              (                                  ɛ                  ,                                      x                    ⁡                                          (                      t                      )                                                                      )                                      ⁢                          A              ⁡                              (                                  ɛ                  ,                                                            z                      _                                        i                                                  )                                                                        (        3        )            
Equation 3 describes a network of n nonlinear oscillators. There are again surface similarities with the previous models. The parameters, ω, α and β1 correspond to the parameters of the truncated model. β2 is an additional amplitude compression parameter, and c represents strength of coupling to the external stimulus. Two frequency detuning parameters δ1 and δ2 are new in this formulation, and make oscillator frequency dependent upon amplitude (see FIG. 3C). The parameter ε controls the amount of nonlinearity in the system. Most importantly, coupling to a stimulus is nonlinear and has a passive part, P(ε, x(t)) and an active part, A(ε, z) producing nonlinear resonances.
Equation 3 above is generally stated in terms of a time-varying input signal x(t). Here x(t) may be an input audio source signal, or it may be input from other oscillators in the same network or oscillators in other networks. Several cases of the latter are illustrated in FIG. 1, labeled as “internal coupling”, “afferent coupling” and “efferent coupling”. In such cases, x(t) results from the multiplication of a matrix of connection values with a vector of oscillator state variables, representing a gradient frequency neural network. Equation 3 accounts for these different inputs, but for ease of explanation includes a single generic input source, x(t). This system, and particularly the construction of the nonlinear coupling expressions, is described in detail in copending patent application Ser. No. 13/016,713.
The Large method and system for the behavior of a network of nonlinear oscillator better mimics the complexity of the ear response to complex audio signals than the prior art linear models. However, it still suffers from the disadvantage that unlike the auditory system, it can not learn the connections between oscillator pairs, rather, information must be known about the input audio signal ahead of time to determine which connections among the oscillators would be the most significant. Large enables connection of oscillators within and between gradient frequency nonlinear oscillator networks, as illustrated in FIG. 1. However, it required that connections be designed by hand to provide the desired behavior of a network. In short, the Large system is static, not dynamic in its connection pattern.