1. Statement of the Technical Field
The present application relates generally to the perception and recognition of signals input and, more particularly, to a signal processing method and apparatus for providing a nonlinear frequency analysis of structured signals.
2. Description of the Related Art
In general, there are many well-known signal processing techniques that are utilized in signal processing applications for extracting spectral features, separating signals from background sounds, and finding periodicities at the time scale of music and speech rhythms. Generally, features are extracted and used to generate reference patterns (models) for certain identifiable sound structures. For example, these sound structures can include phonemes, musical pitches, or rhythmic meters.
Referring now to FIG. 1, a general signal processing system in accordance with the prior art is shown. The processing system will be described relative to acoustic signal processing, but it should be understood that the same concepts can be applied to processing of other types of signals. The processing system 100 receives an input signal 101. The input signal can be any type of structured signal such as music, speech or sonar returns.
Typically, an acoustic front end (not shown) includes a microphone or some other similar device to convert acoustic signals into analog electric signals having a voltage which varies over time in correspondence to the variation in air pressure caused by the input sounds. The acoustic front end also includes an analog-to-digital (A/D) converter for digitizing the analog signal by sampling the voltage of the analog waveform at a desired sampling rate and converting the sampled voltage to a corresponding digital value. The sampling rate is typically selected to be twice the highest frequency component in the input signal.
In processing system 100, spectral features can be extracted in a transform module 102 by computing a wavelet transform of the acoustic signal. Alternatively, a sliding window Fourier transform may be used for providing a time-frequency analysis of the acoustic signals. Following the initial frequency analysis performed by transform module 102, one or more analytic transforms may be applied in an analytic transform module 103. For example, a “squashing” function (such as square root) may be applied to modify the amplitude of the result. Alternatively, a synchro-squeeze transform may be applied to improve the frequency resolution of the output. Transforms of this type are described in U.S. Pat. No. 6,253,175 to Basu et al. Next, a cepstrum may be applied in a cepstral analysis module 104 to recover or enhance structural features (such as pitch) that may not be present or resolvable in the input signal. Finally, a feature extraction module 105 extracts from the fully transformed signal those features which are relevant to the structure(s) to be identified. The output of this system may then be passed to a recognition system that identifies specific structures (e.g. phonemes) given the features thus extracted from the input signal. Processes for the implementation of each of the aforementioned modules are well-known in the art of signal processing.
Referring next to FIG. 2, a general beat detection system in accordance with the prior art is shown. As in FIG. 1, an acoustic signal 201 is digitally sampled, and (optionally) submitted to a frequency analysis module 202 as described previously. The resulting signal is then submitted to an onset detection module 203, which examines the time derivatives of the signal envelope to determine the initiation points of individual acoustic events, in a manner that is well known in the art of signal processing. The resulting onset signal is then submitted to an autocorrelation module 204, which determines the main time lag(s) at which event onsets are correlated in a manner that is well known in the art of signal processing. The foregoing technique is described in more detail in J. C. Brown, Determination of the meter of musical scores by autocorrelation, 94 JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1953-57 (1993). Alternatively, cross-correlation with a predetermined pulse train can produce a similar result as disclosed in U.S. Pat. No. 6,316,712 to Laroche. Finally, a structure identification module 205 determines the frequency and phase of the basic beat of the event sequence. Significantly, the foregoing system is mainly applicable to sequences whose tempo is steady, because a single frequency and phase is determined for an entire sequence.
Referring next to FIG. 3, a general beat tracking system is shown. An input signal 301 is presented as input to the system. The signal consists of onsets that can be determined in a manner described in the previous paragraph, or they can be extracted directly from a MIDI input signal, as is well known in the art. The onset signal is presented as input to a sparse bank of nonlinear oscillators 302, each of which has a distinct frequency. The relative oscillator frequencies are assumed to be known in advance, as is the base frequency. The frequency of the signal may change. The oscillator bank tracks changes in the phase and frequency of input signal, by adapting the phase and frequency of the oscillators in the oscillator bank. U.S. Pat. No. 5,751,899 to Large et al. describes a conventional beat tracking system of the prior art. An output signal 303 is then generated, either in the form of discrete beats (pulses) corresponding to the beat and metrical structure of the sequence or in the form of tempo change messages that describe changes in the tempo (frequency in beats per minute) of the sequence. The output signal can also be directly compared to the input signal (discrete events) to determine the correct musical notation (i.e. note durations) of the input events. Significantly, the applicability of this approach is limited to signals whose initial tempo and main frequency components are known in advance.
The foregoing audio processing techniques have proven useful in many applications. However, they have not addressed some important problems. For example, these conventional approaches are not always effective for determining the structure of a time varying input signal because they do not effectively recover components that are not present or not fully resolvable in the input signal.