The invention relates to analysis of continuous wave data and, more particularly, to a method and apparatus for novel efficient transformation of complex continuous wave data into frequency, amplitude, and time spectral information, unique intelligent pattern recognition, and novel identification of individual events contained within complex continuous wave data.
The need for recognizing and identifying individual events contained within a continuous stream of wave data arises in a broad range of technologies such as the conversion of music into an interactive database of musical events, the recognition of potentially dangerous patterns in a continuous stream of wave data produced by medical monitoring devices, the recognition of specific objects in radar, sonar, laser, and other wave reflections by surveillance or navigational devices, and the like. Event recognition in this context means, in general, the separation and identification of specific wave patterns contained in complex continuous wave data.
Waves are a fundamental means of describing the movement of energy in the physical world. A wave consists of a travelling pattern of energies which fits within a certain distance, or wavelength. The pattern described within a wavelength is a cycle, and a stationary observer perceives the arrival of a repeating wave pattern as a certain number of cycles per second, or frequency. A waveform is defined as one cycle of a wave pattern. There are no theoretical limits to the frequency of a wave pattern, although the limits of current detection apparatus range from ultra low frequency waves (0.001 Hz) to hard cosmic rays (one thousand billion billion Hz). Practitioners in the art will recognize the invention can be applied to any wavelength capable of analysis using state of the art detection and processing equipment.
It is known that a complex waveform can be completely described by its component sinewaves. Complex waves from multiple wave sources can be combined to form even more complex waves containing multiple wave events. In Digital Filters and Signal Processing, Kluwer Academic Publishers, Boston, 1986, Leland Jackson describes how waves of related frequencies combine. Component sinewaves of complex waveforms add together, cancel each other, attract each other, and modify each other through interference. Yet despite this complexity of wave combinations, a wave event generated by a particular source usually maintains its own identity, even when combined with many other simultaneous complex waves. The human perception of sound is the best example of this phenomenon. An instrument, such as a flute, is easily distinguished by an average listener from a second instrument, such as a timpani, even though both instruments are played simultaneously. Such human perception of sound waves suggests that separation and identification of waves of many frequencies should be possible.
Consequently, for many years researchers have studied the perception of sound as the quintessential guide toward understanding complex wave analysis and event recognition. Considerable technology exists in the field of acoustic wave analysis, and technology related to spectral analysis and pattern recognition exists in other fields. The use of acoustic wave analysis as illustrated by the preferred embodiment of the invention serves to demonstrate the invention is useful in many fields for spectral analysis and pattern recognition techniques. Since the complexity of acoustic patterns requires attention to a level of detail exceeding the detail required in many other fields, the application of the preferred embodiment to acoustic patterns serves as a comprehensive example of the practice of the invention.
In current practices of analysis of continuous audio data an instrument playing a note is considered a wave source. The notes played by the instrument are considered the individual events. As time progresses forward the notes played produce a continuous stream of complex wave data. The continuous wave data is digitally sampled at precise intervals and the resulting stream of digits is processed by a computer. Most analysis and event recognition techniques try to transform the continuous stream of sampled wave data into frequency, amplitude, and time spectra before attempting any form of pattern recognition. The goals of existing pattern recognition procedures are the identification of the pitch, amplitude, and timing of the note event.
James A. Moorer describes one type of complex continuous wave data analysis in his paper, "On the Segmentation and Analysis of Continuous Musical Sound by Digital Computer", CCRMA, Department of Music, Report No. STAN-M-3. Moorer uses comb filters to preprocess the music wave data and then refines the data with band-pass filters. Moorer relies on harmonic ratios for his preprocessing and pattern recognition procedures and constructs linked lists of all potential notes based on occurrences of events conforming to predetermined harmonic ratios. Moorer places extreme constraints on the structure and instrumentation of the music which can be analyzed since the optimum comb step depends on predictable and stable harmonics at integer harmonic ratios. However, real music rarely has predictable and stable harmonics and does not conform to the compositional constraints imposed by Moorer.
The transformation of digitally sampled data into accurate frequency, amplitude, and time spectra is a difficult and computationally intensive process that has received considerable research. The nature of the difficulty lies in the fact that, for any single analysis step, the more accurate the frequency resolution becomes, the less accurate the time resolution becomes. The reverse condition is equally true. In "The Bounded-Q Frequency Transform", CCRMA, Department of Music, Report No. STAN-M-28, Kyle L. Kashima, et al., present a method of frequency analysis called the Bounded-Q Frequency Transform, which "lowpass-filters" and "sub-samples" input data and applies a Fast Fourier Transform [FFT] to the resulting data for each octave of output desired. Although the effective frequency resolution is good and the computational time is relatively low, the final results distort time to such a degree that the results are not sufficiently accurate, if the timing accuracy of events is important, which it is in music and most other fields.
Other attempts to transform digitally sampled data into frequency, amplitude, and time spectra are described by Lawrence R. Rabiner, et al., in Theory and Application of Digital Signal Processing, Prentice-Hall Inc., Englewood Cliffs, N.J., 1975. The use of the FFT with a high degree of overlap is described as well as the principle of a bank of band-pass filters. Rabiner's work does not recommend a bank of band-pass filters due to the tremendous computation time required. The inherent problem with Rabiner et al's recommended FFT technique is that the frequency spacing is not suited to most analysis needs since output from the FFT is linearly spaced and does not lend itself to more generalized needs.
John Chosening, et al., propose a study of continuous wave analysis and event recognition in their paper, "Intelligent Analysis of Composite Acoustic Signals," CCRMA, Department of Music, Report No. STAN-M-36. They propose using "simulated real-time problem solving heuristics" to determine strategies for allocating resources and controlling feedback loops, and propose a "system learning coprocessor" for parameter adjustment and various forms of pattern recognition. The multi-rate signal processing they propose is a form of the Bounded-Q Frequency Transform. Thus, the proposed system would suffer time distortion. Although this paper was only a proposal for continued funding, it does draw attention to the need for a process that can learn and retain a knowledge base of the facts learned.
In "Techniques for Note Identification in Polyphonic Music," CCRMA, Department of Music, Report No. STAN-M-29, Cris Chafe uses a moving-average technique to identify the beginnings of events prior to application of the Bounded-Q Frequency Transform. Although this technique can work well for some extremely simple forms of music, it does not perform well for dense or complex music. In addition to the weakness of the Bounded-Q Frequency transform, Chafe ignores the unstable portion of the note event, thereby failing to utilize a tremendous portion of the vital data.
Clearly, new and more comprehensive method and apparatus for complex continuous wave analysis and event recognition are needed to efficiently transform sampled wave data into frequency, amplitude, and time spectra without substantially distorting frequency or time. The method and apparatus should evaluate all aspects of spectral data without ignoring any portion of the data, despite any instability in the data, as well as readily adapt to a broad range of frequencies, timing, and analysis needs. The invention addresses these needs and provides an efficient solution to these problems.
The major advantages of the present invention are the speed, accuracy, flexibility, and consistency of the conversion of sampled wave data to frequency, amplitude, and time spectra and the extraordinary accuracy and flexibility of its event recognition. In addition, the capacity of the present invention to be configured to meet virtually any frequency and time requirements, as well as virtually any definable recognized wave event, will enable researchers in a broad range of scientific investigation to utilize the invention effectively.
The present invention accomplishes wave analysis and event recognition with a greater degree of efficiency and accuracy than possible with any of the prior art. More specifically, by using the principles of the present invention the conversion of wave data to frequency, amplitude, and time spectra is many times faster and more accurate than by using the best of the prior art. In addition, the invention performs event recognition with a degree of accuracy and flexibility that effectively removes the restrictions imposed by the best attempts of the prior art. Furthermore, the invention is "virtual" in the sense that its frequency scales, time scales, amplitude scales, conversion procedures, pattern recognition procedures, and file structures can be configured for a variety of applications.