1. Field of the Invention
The present invention is directed generally to automated score following systems and methods, and, more particularly, to a stochastic score following system and method.
2. Description of the Background
Automated musical accompaniment systems are computer systems designed to accept a musical score as input and to provide real-time performance of the accompaniment in synchrony with one or more live soloists. Automated accompaniment systems must concurrently execute several tasks within the real-time constraints of musical performance. First, these systems must observe the soloists by detecting what they have performed. If the soloists' performances do not involve electronic instruments, this will likely require some form of audio signal processing to extract relevant features, such as fundamental pitch. Second, accompaniment systems must track the soloists as they perform the score. Tracking often involves both identifying the soloists' current score position and estimating the soloists' tempo. Third, the systems must react to the soloists by tastefully performing the accompaniment, generally attempting to synchronize the accompaniment with live performers. Finally, accompaniment systems must generate the actual sound for the accompaniment. Sound production is usually accomplished by either controlling audio synthesizers or by directly generating digital audio.
Several systems for accompanying a vocal performer have been previously described in Katayose, et al., "Virtual Performer", Proc. of the 1993 Intl. Computer Music Conference, 1993, pp. 138-45; Inoue et al., "A Computer Music System for Human Singing", Proc. of the 1993 Intl. Computer Music Conference, 1993, pp. 150-53; Inoue, et al., "Adaptive Karaoke System--Human Singing Accompaniment Based on Speech Recognition", Proc. of the 1994 Intl. Computer Music Conference, 1994, pp. 70-77; and Puckette, "Score Following Using the Sung Voice", Proc. of the 1995 Intl. Computer Music Conference, 1995, pp. 175-78. The first three systems accompany amateur vocalists performing pop music. The first two rely on pitch detection for tracking the performer, and the third applies speech processing techniques for vowel recognition. These systems attempt to identify both the score position and the tempo of the performer, and to adjust the computer accompaniment in response. The fourth system was used to accompany a contemporary art piece written for computer and soprano. It relied on pitch detection and did not attempt to determine the tempo of the performer. Rather, it was designed for fast identification of soloist notes that were scored to coincide with computer generated sounds.
The designers of these systems commonly report certain problems that complicate the tracking of a vocalist. These include variation of detected features, such as pitch, resulting from accidental and intentional actions on the part of performers. In addition, methods for pitch detection and vowel detection are generally not themselves error-free. Consequently, all of these systems incorporate heuristics or weighting schemes intended to compensate for mistakes made when features are directly matched against the score.
Thus, there is a need for a system and method for tracking a performer that is based upon a probabilistic description of the performer's score position. The system and method must use a variety of relevant information, including recent tempo estimates, features extracted from the performance, and elapsed time. Unlike previous systems and methods, such a system and method should not require subjective weighting schemes or heuristics and should use either formally derived or empirically estimated probabilities to describe the variation of the detected features and other relevant data. Furthermore, such a system and method should use such features even if they contribute varying degrees of information toward the estimation of score position.
In addition, there is a need for a score following model that can be efficiently implemented on low-end personal computers, so as to satisfy the real-time constraints imposed by musical accompaniment.