1. Field of the Invention
This invention relates to a system for computerized determination of rhythmic beat from a musical excerpt, which is particularly useful in music playback systems such as xe2x80x9cdisc jockeyxe2x80x9d (DJ) equipment.
2. Discussion of the Related Art
Advances in high performance state-of-the-art digital signal processors (DSPs) have led to much research into training machines to listen and respond in the same manner as human listeners to music compositions. Beat counting has been an active research topic among engineering and music societies. This interest derives from the fact that beat counting provides a basis for automatic music transcription and adds dynamics to music playback systems such as DJ equipment. A good beat counting algorithm, upon which DSPs base their patterns, must be capable of extracting relevant beat information from the music and providing a digital output representing the beat which corresponds to that which would be perceived by a human musician.
Human listeners have little problem feeling the beat of most music excerpts. Information, derived from the temporal changes of pitch and timber, words, and the presence of drumbeats, provides adequate cues easily discerned by the ears and brains of listeners. On the other hand, computers or DSPs cannot perceive such information without the application of complex processing techniques such as pitch extraction, speech recognition, and pattern matching. Even where these techniques can be implemented, they provide incomplete solutions. For example, pitch tracking is successful only on monophonic music; it fails otherwise. The same limitation exists for systems which track changes in timber and words. Also, drum beat tracking is ineffective with respect to music pieces having no drums.
One improvement on the above systems is to treat all the above factors equally and to attempt to detect a consistent xe2x80x9cchange patternxe2x80x9d based on an assumption that most changes, which indicate the presence of beats, appear in music signals as onsets of energy modulation. With this technique, the beat counting, usually a cognition problem, is primarily based on onset searching and pattern matching in signal processing systems. With regard to processing of acoustical signals, a straightforward method of onset searching or detecting employs the xe2x80x9cedge detectionxe2x80x9d technique commonly used in image processing systems. However, with the necessary high sampling rate and long beat period (on the order of a few hundred milliseconds), direct edge detecting is very time consuming. A filter bank implementation for reducing the computational complexity has been proposed by E. D. Scheirer in his article xe2x80x9cTempo and Beat Analysis of Acoustic Musical Signals,xe2x80x9d J. Acoust. Soc. Am. 103, 588, 1998 (incorporated by reference herein in its entirety). This method utilizes several filters to split the signal into different subbands and applies down sampling to reduce the total number of points needed for computation. Disadvantages are that filtering is itself time consuming and the subsequent processes must be carried out repeatedly for each band. Therefore, only modest reductions in processing requirements are achieved. While it has been demonstrated that the entire Scheirer algorithm is sufficiently fast to run within the computation time of, for example, a Digital Equipment Corporation Alpha 3000(trademark), it is a tight fit. With greater functionality being demanded by the DJ market, a tight-fit real time algorithm is not adequate. An efficient beat counting algorithm should be capable of running in real time on a less powerful DSP, along with other tasks.
Edge detection generates a train of pulses coinciding with the locations of the onsets in the original acoustic signal. Based on this pulse train, a beat counter operates to determine the frequency of the pulse occurrences. There has been much research addressing this issue from the point of view of psychology and digital signal processing. Among the published algorithms are the autocorrelation algorithm and the resonator phase-locking algorithm.
The autocorrelation method is implied (although not directly used in beat counting) in an article by J. C. Brown, xe2x80x9cDetermination of the Meter of Musical Scores by Autocorrelation,xe2x80x9d J. Acoust. Soc. Am. 94, 1993 (incorporated by reference herein in its entirety). The concept underlying this method is the same as that used by a pitch extractor, except that the beat period is considered longer. The autocorrelation coefficients of the pulse train signal are calculated, and the lag associated with the greatest coefficient is considered the beat period.
The resonator phase-locking method was first presented by E. Large, et al., xe2x80x9cResonance and the Perception of Musical Meter,xe2x80x9d Connection Science 6, 177, 1994 (incorporated by reference herein in its entirety). The concept underlying this technique derives from the Helmholtz resonators which have been used to determine the frequency of analog acoustic signals. The method passes the train of pulses coinciding with the onsets of energy modulation through each resonator of a set of digital resonators with different resonant frequencies. The resonator having maximum energy output is detected, and the frequency of the pulse train is determined by the resonant frequency of this resonator.
Both the autocorrelation and resonator phase-locking methods generate results whose accuracy depends on the parameter settings. A disadvantage of both methods is the computational complexity and cost. Moreover, none of the above methods has adequately addressed concerns with the stability of the beat counter when experiencing an abnormal rhythm change. In this regard, the only proposed solution has been to slow down the responding time, while averaging the result over a long time interval. This proposal has not produced good results. As a result of these problems, the above methods have been very limited in application.
As noted above, music playback systems such as DJ equipment require good performance and low costs. The cost of an algorithm is determined by memory requirements and, more importantly, computational complexity. As discussed above, xe2x80x9creal timexe2x80x9d is no longer a sufficient condition; because DJ audio equipment performs more than one function at a time (such as simultaneously performing beat counting and sound-effect-changing), a speed much faster than real time is needed. There is no foreseeable limit on how fast the algorithm should be.
It is an object of the present invention to provide a novel beat-counting algorithm with a high computation speed which is significantly faster than real time and which can be employed on such apparatus as DJ equipment, CD players and audio effect boxes and with automatic music transcription software.
It is another object of the present invention to provide a novel beat-counting system having the capability of reporting stabilized results. This feature is enabled by the present invention because of the fast speed of the algorithm which gives time for additional decision-making steps to be carried out before a BPM (beats per minute) decision is reported.
It is yet another object of the present invention to provide a novel beat-counting system which has the capability of operating on an acoustical signal rather than a MIDI (musical instrument digital interface) signal.
The algorithm according to the present invention is summarized as follows. An onset searching/pattern matching structure is employed with an efficient and reliable group-summing method that is conducted as a preprocessing step to reduce the sample points. The beat frequency searching algorithm is simplified based on a novel analogy with the beat perception mechanism of the human mind and ears. After a BPM is generated, a stability enhancement method is used to decide whether the BPM needs to be updated.
The goal of the algorithm of the present invention is to provide a beat counter which can be mounted on a CD player or an effect box for displaying beat count in real time. The algorithm includes five basic steps: down sampling, group summing, onset detecting, beat counting, and stability enhancing.
According to one aspect of the invention, there is provided a method of determining a rhythmic beat of a digital sound signal, comprising the steps of (a) down sampling the digital signal by a predetermined factor to produce a decimated signal comprising a plurality of first data points; (b) grouping the plurality of first data points into groups each comprising a predetermined number of the first data points of the decimated signal and summing absolute values of the data points in each of the groups to produce a group-summed signal comprising a plurality of second data points; (c) dividing the plurality of second data points of the onset peak train into a plurality of successive frames of uniform duration; (d) determining for each of the frames a threshold value and detecting, within each of the frames, peak profiles each comprising successive ones of the second data points having values greater than the threshold value; (e) detecting, within each of the peak profiles, a peak point having a greatest value among the successive ones of the second data points; and (f) determining a match between (i) the peak point and ones of the second data points located at least one of before and after the peak point and (ii) one of a plurality of unit data pulse sequences, having different periods, in accordance with an algorithm, wherein the rhythmic beat is determined to correspond to the period of the one of the unit pulse sequences.
The threshold value may be defined by a relation (A+M)/2, where A is the average of the values of all of the second data points within one of the frames and M is the maximum of the values of all of the second data points within the one of the frames. Step (f) can comprise (i) calculating a function
Sumi(n)=x(M)+x(M+n)+x(M+2n)+ . . . +x(Mxe2x88x92n)+x(Mxe2x88x922n)
where, for the first one of the frames, x is a signal representing the onset peak train, i is a selected peak point index, M is a peak point position, and n is a period of the unit pulse sequences, where n ranges from a first integer number to a second integer number, (ii) calculating Sum(n)=xcexa3iSumi(n), and (iii) determining a value of n=N resulting in a greatest sum Sum(n)=xcexa3iSumi(n), wherein the match is determined to exist with the one of the unit pulse sequences having a pulse period equal to N, and the rhythmic beat is determined to correspond to period N. The method may further comprise a check frame decision step (g) comprising: (i) with respect to a second frame of the plurality of successive frames which immediately succeeds the first one of the frames, performing a check frame decision processing by calculating a function Sumi(n)=x(M)+x(M+n)+x(M+2n)+ . . . +x(Mxe2x88x92n)+x(Mxe2x88x922n), where, for the second frame, x is a signal representing the onset peak train, i is a selected peak point index, M is a peak point position, and n is a period of the unit pulse sequences, where n ranges from n=Nxe2x88x92L to n=N+L, where L is an integer number which is less that the difference between the first and second integer numbers, calculating Sum(n)=xcexa3iSumi(n) and determining whether N yields a peak in Sum(n) for the check frame processing of the second frame; (ii) if step (g)(i) determines that N yields the peak in Sum(n) for the check frame processing of the second frame, the rhythmic beat for the first frame and for the second frame is determined to correspond to period N and a third frame immediately succeeding the second frame is processed in accordance with step (g)(i), and (iii) if step (g)(i) determines that N does not yield the peak in Sum(n), the rhythmic beat for the second frame is determined to correspond to period N and the third frame is processed in accordance with step (g).
According to another aspect of the invention, there is provided an apparatus for determining a rhythmic beat of a digital sound signal, the apparatus comprising (a) decimation means for down sampling the digital signal by a predetermined factor to produce a decimated signal comprising a plurality of first data points; (b) group summation means for grouping the plurality of first data points into groups each comprising a predetermined number of the first data points of the decimated signal and summing absolute values of the data points in each of the groups to produce a group-summed signal comprising a plurality of second data points; (c) means for dividing the plurality of second data points of the onset peak train into a plurality of successive frames of uniform duration; (d) determination means for determining for each of the frames a threshold value and for detecting, within each of the frames, peak profiles each comprising successive ones of the second data points having values greater than the threshold value; (e) detection means for detecting, within each of the peak profiles, a peak point having a greatest value among the successive ones of the second data points; and (f) match detection means for determining a match between (i) the peak point and ones of the second data points located at least one of before and after the peak point and (ii) one of a plurality of unit data pulse sequences, having different periods, in accordance with an algorithm, wherein the rhythmic beat is determined to correspond to the period of the one of the unit pulse sequences. The apparatus of the invention can include the same refinements described above with respect to the method of the present invention.