1. Field of the Invention
The present invention relates to an apparatus and a method for recognizing musical chords from incoming musical sound waveform, and more particularly to such an apparatus and a method in which a fractional duration of sound wave is analyzed into a frequency spectrum having a number of level peaks and exhibiting a spectrum pattern, and then a chord is recognized based on the locations of those peaks in the spectrum pattern.
2. Description of the Prior Art
The prior art for recognizing chords by analyzing musical sound waves includes Marc Leman's approach which contemplates the derivation of information necessary for establishing a chord directly from the information of a frequency spectrum (the distribution of energy levels of respective frequency components) of a musical sound waveform subjected to analysis from a conceptual point of view that each chord is a pattern constituted by a combination of plural frequency components. As a practical example of such a chord recognition method, there has been proposed a process utilizing a simple auditory model (usually referred to as "SAM") including process steps as shown in FIG. 16.
Referring to FIG. 16, the chord recognition process steps of the SAM method will be descried briefly hereunder. The SAM method is to recognize chords by reading out wave sample data of one fraction (along the time axis) after another of the stored (in the storage device of the analyzing system beforehand) sound waveform of a musical tune (performance) from the top of the wave, and recognizing each chord for each time fraction of the sound waveform. For example, step A reads out data of a fractional piece of the musical sound wave (e.g. of an amount for the time length of 400 milliseconds or so) from among the stored sound wave sample data as a subject of the analysis, and step B extracts the frequency components of the read-out fraction of the sound wave using the FFT (Fast Fourier Transform) analysis to establish a frequency spectrum of the wave fraction. Then, step C folds (cuts and superposes) the frequency spectrum of the extracted frequency components throughout the entire frequency range on an octave span basis to create a superposed (combined) frequency spectrum over the frequency width of one octave, i.e. an octavally folded frequency spectrum, and locates several peaks exhibiting prominent energy levels in the octaval spectrum, thereby nominating peak frequency components. Step D then determines the tone pitches (chord constituting notes) corresponding to the respective peak frequency components and infers the chord (the root note and the type) based on the peak frequencies (i.e. the frequencies at which the spectrum exhibits peaks in energy level) and the intervals between those peak frequencies utilizing a neural network.
The SAM method, however, has some drawbacks as mentioned below.
(1) As all of the frequency components that are extracted by the FFT process are used for the recognition of a chord, there are so many frequency components to be analyzed that the amount of computation in each of the analyzing processes for recognizing a chord is accordingly large. And moreover, as the frequency components in such a low and a high frequency range that is not audible to human ear are also involved in the analysis, the accuracy of analysis will be deteriorated.
(2) While a number of frequency components that exhibit large energy levels are simply determined to be the peak frequency components, such determination of peak frequency components may not be very adequate, considering the fact that there may be included a fairly large noise frequency components in the frequency components that are extracted from the sound wave data. For example, if a peak frequency component is determined within the frequency range which includes frequency components with like energy levels, there can be a high possibility of inadequate determination of the peak frequency component, which will lead to an erroneous recognition of the chord.
(3) In inferring note pitches from the peak frequency components, the note pitches are determined simply taking the frequency component of 440 Hz as the A4 note reference. Therefore, in the case where the pitches of all the tones in the musical tune to be analyzed are deviated as a whole (i.e. shifted in parallel), the note pitches will be erroneously inferred. Another disadvantage will be that an overall pitch deviation may cause one peak area to fall in two adjacent frequency zones and extract two peak frequency components from one actually existing tone in those zones, and thus the inference will be that there are two notes sounded even though there is actually only one tone sounded in such a frequency zone.
(4) Marc Leman's paper simply describes that the determination of the chord is made by using a neural network. And accordingly, what kind of process is actually taken for determining the chord type is not clear, and moreover the behavior of the neural network cannot be controlled indicatively by a human, which leads to an insufficient reliability for practical use.