The invention relates to the analysis and understanding of acoustic spectra, particularly the spectra of musical sounds. More specifically, the invention relates to the extraction of pitch and timbre from the spectra of musical sounds.
Musicians and others (piano tuners, acousticians) are often concerned with evaluating musical sounds in real time, particularly with regard to pitch and tone quality (timbre). During training, rehearsal, and performance, both singers and instrumentalists make these evaluations continuously, and adjust their technique accordingly to improve the sound. Music teachers, orchestral conductors and choral directors make similar evaluations, and by gesture or verbal instruction indicate how performance should be improved.
In all of these endeavors, the human ear and brain are used to evaluate the sound. Although this mechanism is necessary during performance, and marvelous for judging the xe2x80x9chigherxe2x80x9d qualities of music such as xe2x80x9cexpressivenessxe2x80x9d, it is hardly ideal for evaluating purely mechanical aspects of sound such as pitch and timbre, because human judgment is subjective. This problem is particularly acute for performing musicians, because the person evaluating the sound is busy producing it. Thus singers and instrumentalists often sing and play off-key while swearing they are in tune, or produce a poor tone quality (timbre) while imagining they are producing a good one.
The tendency to misjudge can be remedied by training, but in the absence of a teacher, in the hours practicing alone, there is typically no objective measure of pitch and timbre. Several techniques may be used but have limitations and drawbacks. For example, a keyboard instrument may be used intermittently to check pitch, but it dos not give continuous pitch feedback to the musician, and says nothing about timbre. Alternatively, recording and playing back may be used to separate the process of sound production from that of sound evaluation, but this is tedious because it is not real time.
To solve these problems, a mechanism is needed to provide real-time visual feedback of pitch and timbre to the musician, based on objective and consistent measurements. Visual feedback is ideal because does not interfere with the auditory feedback that the musician must ultimately use in performance. Rather, the visual feedback should help train the auditory system by showing the musician when pitch and tone quality are good. A personal-computer-based software tool would be ideal, since it is flexible, improves automatically as computer technology progresses, and avoids the cost of dedicated instrumentation.
To analyze sound, particularly musical sound, it is essential to begin, as the ear does, with a spectral analysis. All subsequent analysis, such as the extraction of pitch and timbre, depends on the spectral analysis. Yet, as shown below, it is at this fundamental level of spectral analysis that the prior art is deficient. The prior art""s technique for doing spectral analysis is the Discrete Fourier Transform (DFT), and its efficient implementation known as the Fast Fourier Transform (FFT). See Numerical Recipes in C; The Art of Scientific Computing, William H. Press, Brian P. Flannery, et. al., Cambridge University Press, 1988, ISBN 0-521-35465-X, pp. 403-418, which is herein incorporated by reference in its entirety.
To demonstrate the deficiency of the DFT, it is helpful to summarize some of the mathematics involved. Using the DFT, a signal g(t) (e.g. sound pressure as a function of time) is windowed by a windowing function W(t), such as the Welch Window, which is defined to be non-zero only over the time interval [0, xcex94t].
The windowed function
ĝ(t)xe2x89xa1g(t)W(t)xe2x80x83xe2x80x83(1)
is sampled at N discrete times in the interval [0, xcex94t], namely                                                         t              n                        ≡                          n              S                                =                                    n              N                        ⁢            Δ            ⁢                          xe2x80x83                        ⁢            t                          ,                  xe2x80x83                ⁢                  n          =          0                ,        …        ⁢                  xe2x80x83                ,                  N          -          1                ,                            (        2        )            
where S is the sampling rate in Hz. Therefore the total time to measure the N-fold ensemble of samples is                                           Δ            ⁢                          xe2x80x83                        ⁢                          t              meas                                =                                    N              S                        .                          xe2x80x83                        ⁢                          (                              Sound                ⁢                                  -                                ⁢                measurement                ⁢                                  xe2x80x83                                ⁢                time                            )                                      ⁢                  xe2x80x83                                    (        3        )            
Furthermore, using the DFT, the frequency content of ĝ(t) at frequency f, given by                                                         G              ^                        ⁡                          (              f              )                                =                                    1              S                        ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                              xe2x80x83                            ⁢                                                                    g                    ^                                    ⁡                                      (                                          t                      n                                        )                                                  ⁢                                  ⅇ                                      2                    ⁢                    π                    ⁢                                          xe2x80x83                                        ⁢                    ⅈ                    ⁢                                          xe2x80x83                                        ⁢                    f                    ⁢                                          xe2x80x83                                        ⁢                                          t                      n                                                                                                          ,                            (        4        )            
is evaluated only at certain discrete values of the frequency f namely at the values                                           f            k                    ≡                                    S              N                        ⁢            k                          ,                  xe2x80x83                ⁢                  k          =                      -                          N              2                                      ,        …        ⁢                  xe2x80x83                ,                              N            2                    .                                    (        5        )            
(5)
Therefore the frequency granularity of the DFT (the difference between two adjacent frequencies, fk+1xe2x88x92fk), is                               xe2x80x83                ⁢                              Δ            ⁢                          xe2x80x83                        ⁢            f                    =                                    S              N                        .                          xe2x80x83                        ⁢                          (                              Frequency                ⁢                                  xe2x80x83                                ⁢                granularity                ⁢                                  xe2x80x83                                ⁢                for                ⁢                                  xe2x80x83                                ⁢                DFT                            )                                                          (        6        )            
The deficiency of the DFT is summarized by equations (3) and (6), which together imply that
xcex94f xcex94tmeas=1. (Property of Discrete Fourier Transform)xe2x80x83xe2x80x83(7)
That is, with the DFT, it is impossible to achieve both a short sound-measurement time xcex94tmeas and a fine frequency granularity xcex94f . For example, if it is desired to have a short sound-measurement time of 0.1 seconds, then (7) implies that xcex94f must assume the rather coarse value of 10 Hz. Conversely, if a fine frequency granularity of 1 Hz is desired, xcex94tmeas must assume the large value of 1 second. In light of equation (7), it may be concluded that the DFT is inadequate for applications requiring both real-time data acquisition and precise spectral analysis in real time, because such applications require both small xcex94tmeas and small xcex94f.
For example, in applications where musical sound need to be measured and analyzed in real time, small xcex94tmeas is necessary to achieve the xe2x80x9creal-timexe2x80x9d objective. In particular, since fast musical notes are on the order of 80 to 100 milliseconds in length, the application demands
xcex94tmeasxe2x89xa60.1 seconds.xe2x80x83xe2x80x83(8)
For the same type of application, small xcex94f (frequency granularity) is necessary to achieve accurate results in the computation of pitch. The frequency ratio between two musical notes a half-step apart on the equally tempered scale is                                                         f              +                        f                    =                      2            12                          ,                            (        9        )            
where f+ is the upper of the two notes and f is the lower of the two notes. Thus the frequency difference between two notes a half-step apart is                               Δ          ⁢                      xe2x80x83                    ⁢                      f            halfstep                          =                              f            ⁡                          (                                                2                  12                                -                1                            )                                .                                    (        10        )            
For example, at C131 (i.e. 131 Hz, a note in the middle of the range of a human baritone voice), xcex94fhalfstep is 7.8 Hz. Thus to achieve good pitch resolution of, say, an eighth of a half step, the application demands roughly
xcex94fxe2x89xa61 Hz.xe2x80x83xe2x80x83(11)
Thus, the requirements of such an application with regard to data-acquisition time and frequency granularity, typified by equations (8) and (11), are an order of magnitude more demanding than the capability (7) offered by the DFT. Therefore the DFT is inadequate for such applications, and any prior art that uses it is likewise inadequate. This inadequacy is not dependent on the speed of the computer used to implement the DFT; even if the computer were infinitely fast, the inadequacy would remain the same, because it is inherent in the DFT algorithm itself.
Because the prior art is thus deficient in its ability to perform real-time data acquisition and finely-resolved spectral analysis simultaneously, it is therefore also deficient in its ability to perform accurate, real-time xe2x80x9cnote analysisxe2x80x9d, wherein the pitch and timbre of the sound are extracted, since note analysis uses the output of spectral analysis as its starting point.
PC Programs to acquire and analyze sounds using the DFT certainly exist, such as CoolEdit by Syntrillium Software and Spectrum Analysis by Sonic Foundary. However, these programs are not typically aimed at real-time applications, and make no attempt to extract pitch and timbre information. As such, they fail to provide useful information to a musician or other user requiring instantaneous, continuous feedback on the pitch and quality of live sound.
One PC program aimed specifically at musicians is Soloist by Ibis Software. This program provides nothing related to timbre feedback. Moreover it provides only a limited form of pitch feedback; for example, it cannot distinguish notes an octave apart. Furthermore the pitch feedback is not truly xe2x80x9creal-timexe2x80x9d; only one sound sample is analyzed per metronome beat.
An object of this invention is a system and method for analyzing the frequency spectrum of a signal in real time, particuarly the spectrum of an acoustic signal having a musical nature, the method providing real-time means to identify the pitch and timbre of the musical note represented by the spectrum, and also providing means to visualize the pitch and timbre, thereby providing real-time visual feedback of musical sounds, particularly to singers and instrumental musicians.
The invention comprises a transducer, computer hardware, and software. The computer hardware may be a standard, IBM-compatible Personal Computer containing a waveform-input device, such as a Creative Labs"" SoundBlaster(trademark) or equivalent. The transducer (such as a microphone) converts a signal (such as sound waves) into a time-varying voltage. The waveform-input device periodically samples this voltage and digitizes each sample, thereby producing an array of N numbers in the memory of the computer that represent a small snippet of the signal measured over a time interval xcex94tmeas. Snippets are typically measured one after the other at a repetition rate that is inversely related to xcex94tmeas. The software, also stored in the memory of the computer, and executed using its central processing unit, includes a spectral-analysis process that analyzes the frequency content of each snippet and produces an associated spectrum. The software also includes a novel note-analysis process that analyzes the spectrum and extracts from it the pitch and timbre of the principal musical note contained therein. The process works for any spectrum, including cases where the fundamental frequency of the note is missing. The software further includes novel processes to visualize graphically the pitch and the timbre.