The present invention relates to methods of, and devices for, analysing music as it is being played in real time. Such devices display musical information derived from such an analysis with the information being displayed on a screen or some other device, and/or produce electrical outputs corresponding to the pitch, amplitude or other characteristic of the music being analysed. Such data is normally used to control music synthesisers, with the objective of playing synthesised sounds in synchronism with source music. For example, music played on a trumpet may be fed into such a device, which in turn feeds a synthesiser producing a piano-like sound with the result that the music played by the trumpet player will be reproduced as a piano sound accompaniment.
Such devices suffer from a major problem in that they have difficulty detecting musical gestures such as the onset of successive notes. The term "musical gestures" as used herein means the onset, or cessation, of individual notes comprising a musical performance or events of similar musical significance, for example the plucking, striking, blowing, or bowing of a musical instrument.
Traditional methods of detecting musical gestures have been based either upon the amplitude of the gesture or upon the pitch of the gesture. The detection of musical gestures based upon their amplitude uses either an amplitude threshold detector or a peak detector, or a combination of the two.
The prior art method of using a threshold detector is as follows:
When the amplitude of an incoming audio signal exceeds a preset level, the trigger for the envelope of the synthetic tone is commenced. This prior art method has the disadvantage that, for almost all real musical tones which are used as input, the amplitude does not drop significantly between notes played in rapid succession. As a consequence, many of the new notes played into the device do not cause desired corresponding new envelopes to be commenced in the synthesised timbre.
With the prior art Peak detection means, use is made of the fact that many real musical input tones have a much greater level when a new note is played. One difficulty with this arrangement is that many musical instruments which can be used to originate the audio input, have amplitudes which rise very slowly when a new note is commenced. Such musical instruments include members of the string family where a bowing action is employed to articulate notes. Also, members of the brass and woodwind families can, when played by the instrumentalist according to certain techniques, exhibit slowly rising amplitudes. This makes it difficult to detect the peak quickly.
A further problem in this connection is that the synthetic envelope, which is commenced by the synthesiser, only begins to increase in amplitude after the peak of the input has been detected and thus the input's signal amplitude is decreased. Since the synthesiser is operating in real time, this means that the synthesiser is only starting a note when the input signal is decaying. This leads to an unacceptable delay between the envelope of the input signal and the envelope of the synthesised timbre, especially for musical inputs which take a very long time for their amplitudes to peak (for example a bowed cello).
Another problem with peak detection is that when a musical input consists of notes played in very rapid succession, the peaks are seldom much larger than the previous amplitude and hence, are difficult to detect and are easily missed.
Prior art methods of detecting musical gestures based upon pitch have always been relatively crude. In one prior art method, a new note commenced by the synthesiser (that is a new synthesised envelope) is commenced when the input pitch crosses some predefined boundary. This method is known as pitch quantisation. It has the effect of mapping all possible input pitches into a finite set of pitches (usually semitones) according to the member of the set to which the input pitch is closest. A substantial problem with this method is that if an input pitch is close to a boundary, any slight deviations of the input pitch can cross the boundary, thus generating new envelopes in the synthesised timbre where no real musical gesture existed in the input signal.
Furthermore, most musical inputs are capable of vibrato (that is a low frequency pitch modulation) and can cross several semitone boundaries. This leads to a glissando effect in the synthesised timbre because of the creation of envelopes in the synthesised timbre which have no matching counterpart in the input signal. While this may be potentially musically interesting, it is generally speaking an undesirable and unwanted side effect.
A further prior art method of detecting new notes based upon pitch, is to only generate a new envelope in the synthesised timbre when the Pitch detector has detected a pitched input signal as opposed to a pitchless or random input signal. The major disadvantage of this scheme is that two notes which are not separated by unpitched sounds, do not cause a new synthesised envelope to be generated. For musical inputs from musical instruments which have a long reverberant sustained characteristic (such as those instruments which incorporate a resonant cavity in their physical construction for the purpose of amplifying the acoustic output of the primary vibrating mechanism, (members of the string family are examples) notes are not separated by unpitched input and hence, some envelopes which ought to have been generated by the synthesiser are not generated.
In addition to detecting musical gestures, it is highly desirable that such synthesisers be able to detect the force with which a new note was played by a musician. The traditional prior art method of force detection is to record the peak amplitude or the amplitude at the time at which the synthetic envelope is commenced. This information is then used to determine the magnitude of the synthetic envelope. In the first case, information about the force of playing was not available until the amplitude had peaked which, in the case of inputs having an amplitude rising only slowly, leads to an unacceptably long delay before an envelope and timbre, suitably modified according to the force of playing information, could be commenced by the synthesiser.
In the second case where the amplitude value at the time a new note is detected is used as a representation of the playing force, the prior art method suffers from a lack of resolution in level and tends not to be correlated with playing force in a repeatable way. As a consequence, different amplitude levels can occur for the same playing force. In particular, there is no direct and unique identification of playing force from raw amplitude readings.