1. Technical Field
The invention relates to musical instruments. More particularly, the invention relates to a voice-controlled electronic musical instrument.
2. Description of the Prior Art
Musical instruments have traditionally been difficult to play, thus requiring a significant investment of time and, in some cases money, to learn the basic operating skills of that instrument. In addition to frequent and often arduous practice sessions, music lessons would typically be required, teaching the mechanical skills to achieve the proper musical expression associated with that instrument, such as pitch, loudness, and timbre. In addition, a musical language would be taught so that the user would be able to operate the instrument to play previously written songs.
The invention relates to a hand-held music synthesizer whose output is controlled by the human voice, referred to herein as xe2x80x9cVocolo(trademark).xe2x80x9d The principles and features of the Vocolo were set forth in the patent application entitled xe2x80x9cVoice Controlled Electronic Musical Instrument,xe2x80x9d PCT Serial No. PCT/US00/13721, henceforth referred to as the xe2x80x9cReference Patent Application.xe2x80x9d Note that alternate names of the Vocolo(trademark) used in the reference document were xe2x80x9cHumHorn(trademark)xe2x80x9d and xe2x80x9cHumBand(trademark).xe2x80x9d The Vocolo is an electronic, voice-controlled musical instrument. It is in essence an electronic kazoo. The player hums into the mouthpiece, and the device imitates the sound of a musical instrument whose pitch and volume change in response to the player""s voice. The player is given the impression of playing the actual instrument and controlling it intimately with the fine nuances of his voice.
The evolution of musical instruments has been relatively slow, with few new musical-instrument products taking hold over the past several hundred years. The introduction of electronics-related technology, however, has had a significant impact on musical-instrument product development. The music synthesizer, for example, together with the piano keyboard interface/controller, has vastly expanded the number and variety of instrument sounds which can be produced by a person who has learned to play a single instrumentxe2x80x94that of piano or keyboards. The requirement remained, however, that for someone to operate a synthesizer, that person would have to learn at least some of the fundamentals of music expression associated with playing a piano.
Therefore, for those people who wanted to be able to express themselves musically, but had not learned to play an instrument, or wanted to be able to make many instrument sounds without learning how to play each instrument, there was still a significant time investment required to learn the skill, with no assurance that they could ever reach a level of proficiency acceptable to them.
In U.S. Pat. Nos. 3,484,530 and 3,634,596 there are disclosed systems for producing musical outputs from a memory containing recorded musical notes that can be stimulated by single note inputs through a microphone. The systems disclosed in these patents are reportedly able to detect pitch, attack, sustain, and decay as well as volume level and are able to apply these sensed inputs to the recorded note being played back. In effect, the systems are musical note to musical note converters that may be converted fast enough so that no lag can be detected by the listener or by the player. However, to achieve these capabilities, rather cumbersome and expensive electronic and mechanical means were suggested, which are not suited for portable or handheld instruments, but primarily intended for larger systems.
In the systems disclosed in the above patents, the memory is capable of containing discrete notes of the chromatic scale and respond to discrete input notes of the same pitch. The system is analogous to a keyboard instrument where the player has only discrete notes to choose from and actuates one by depressing that particular key. Other musical instruments give a player a choice of pitches between whole and half tone increments. For example, a violin can produce a pitch which is variable depending upon where the string is fretted or a slide trombone can cause a pitch falling in between whole and half tone increments. Both of these instruments produce an unbroken frequency spectrum of pitch. However, such prior art systems are not able to provide a continually varying pitch at the output in response to a continually varying pitch at the input, nor have they been able to produce a note timbre that realistically duplicates what a real instrument does as a function of pitch over the range of the instrument nor provide a note quality or timbre which realistically duplicates what a real instrument does as a function of degree of force at the input of an instrument.
A variety of other methods have been proposed to use the human voice to control a synthesizer, thus taking advantage of the singular musical expression mechanism which most people have. Virtually anyone who can speak has the ability to change musically expressive parameters such as pitch and loudness. One such method is described in R. Rupert, U.S. Pat. No. 4,463,650 (Aug. 7, 1984). In the Rupert device, real instrumental notes are contained in a memory with the system responsive to the stimuli of, what he refers to as xe2x80x9cmouth musicxe2x80x9d to create playable musical instruments that responds to the mouth music stimuli in real time. See, also, K. Obata, Input apparatus of electronic device for extracting pitch from input waveform signal, U.S. Pat. No. 4,924,746 (May 15, 1990).
Ishikawa, Sakata, Obara, Voice Recognition Interval Scoring System, European Pat. No. 142,935 (May 29, 1985), recognizing the inaccuracies of the singing voice xe2x80x9ccontemplates providing correcting means for easily correcting interval data scored and to correct the interval in a correcting mode by shifting cursors at portions to be corrected.xe2x80x9d In a similar attempt to deal with vocal inaccuracies, a device described by M. Tsunoo et al, U.S. Pat. No. 3,999,456 (Dec. 28, 1976) uses a voice keying system for a voice-controlled musical instrument which limits the output tone to a musical scale. The difficulty in employing either the Ishikawa or the Tsunoo devices for useful purposes is that most untrained musicians do not know which scales are appropriate for different songs and applications. The device may even be a detractor from the unimproved voice-controlled music synthesizer, due to the frustration of the user not being able to reach certain notes he desires to play.
In a related area, the concept of xe2x80x9cmusic-minus-onexe2x80x9d is the use of a predefined usually prerecorded musical background to supply contextual music around which a musician/user sings or plays an instrument, usually the lead part. This concept allows the user to make fuller sounding music, by playing a key part, but having the other parts played by other musicians. Benefits to such an experience include greater entertainment value, practice value and an outlet for creative expression.
M. Hoff, Entertainment and creative expression device for easily playing along to background music, U.S. Pat. No. 4,771,671 (Sep. 20, 1988) discloses an enhancement to the music minus-one concept, providing a degree of intelligence to the musical instrument playing the lead the voice-controlled music synthesizer, in this case so as not to produce a note which sounds dissonant or discordant relative to the background music. In addition, Hoff discloses a variation on the voice-controlled music synthesizer by employing correction. Rather than correcting the interval in an arbitrary manner, as suggested in the Tsunoo and Ishikawa patents, this device adjusts the output of the music synthesizer to one which necessarily sounds good to the average listener, relative to predefined background music. However, Hoff performs pitch correction only in the context of pre-programmed accompaniments, using the scale note suggested by the accompaniment nearest to the detected pitch. Hoff does not provide pitch correction in the absence of accompaniment, for example, the capability for the user to choose the scale to be used for the pitch correction or the capability to assign the currently detected pitch to the tonic of that scale. Various approaches to the process of pitch detection itself are known. For example, see M. Russ, Sound Synthesis and Sampling, Focal Press, 1996, p. 265, or L. Rabiner et. al., A Comparative Performance Study of Several Pitch Detection Algorithms, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-24, No. 5, October 1976, p. 399. According to Russ, the traditional general classifications for pitch detection are a) zero-crossing, b) auto-correlation, c) spectral interpretation.
Autocorrelation is currently probably the most popular method used commercially today for pitch detection. Three auto-correlation approaches that bear some resemblance to the present approach are, for example, S. Dame, Method and Device For Determining The Primary Pitch of A Music Signal, U.S. Pat. No. 5,619,004 (Apr. 8, 1997) and M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, and H. J. Manley, Average Magnitude Difference Function Pitch Extractor, IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-22, No. 5 (October 1974). Hildebrand, H. A., Pitch Detection and Intonation Correction Apparatus and Method, U.S. Pat. No. 5,973,252 (Oct. 26, 999). F. Mekuria, Detection of periodicity information from an audio signal, U.S. Pat. No. 5,970,441 (Oct. 19, 1999) discloses a method of pitch detection emphasizing peaks in a (low pass) filtered audio signal.
A major drawback of all presently known systems that allow voice control of a musical instrument is that they require bulky enclosures and are presented in unfamiliar form factors, i.e. as imposing pieces of technical equipment. Thus, a user is unable to connect with such instruments in a natural way. Rather than playing a musical instrument, such devices give one the impression of operating a piece of machinery that, in most cases, is similar to operating a computer. This fact alone well explains the lack of commercial success and consumer acceptance these devices have found.
It would be advantageous to provide a voice-controlled musical instrument in a form factor similar to an actual instrument. It would be further advantageous if such form factor contributed to the ease of use of such instrument by providing a user with a simple method of operation. It would also be advantageous to provide a computationally efficient pitch detection technique for a voice-controlled electronic musical instrument, such that a reduced size form factor, as well as an economical price, could be achieved.
Given that no pitch detection method is perfect and that there will always be some errors, it would also be advantageous to provide means for reducing the errors and/or mitigating the effect of these errors on the sound quality of the instrument synthesis.
It would further be advantageous to provide features that allow the player to take advantage of the Vocolo""s unique style of control. For virtually any other musical instrument the hands are preoccupied with just playing the notes. With the Vocolo the hands are free to control nuances of the performance such as vibrato, volume (and tremelo), and timbre control. The voice of the player can also be used to control nuances as well, providing an arsenal for creating unique and powerful performances.
The Vocolo provides a visceral experience when held in the hands because its sound output can be felt through its body. To accentuate this attribute it would be advantageous to provide a special means for transmitting mechanical pulses through the body of the Vocolo that corresponds to a precise background rhythm.
The Vocolo can be a great tool for improvisation and for the creation of personal compositions. For this purpose, it would be advantageous to allow a player to xe2x80x9cjamxe2x80x9d by himself. That is, to be able to record a sequence of notes as a background accompaniment, and then be able to play along with this accompaniment.
The voice interface for the Vocolo lends itself well to gaming applications because it can recognize patterns in the pitch and timing of notes. Thus it would be advantageous to provide a means for vocal pattern recognition, as well as different ways to utilize such a capability for different kinds of games.
The invention relates to a hand-held music synthesizer whose output is controlled by the human voice, presently called the Vocolo. The Vocolo is an electronic, voice-controlled musical instrument. The player hums into the mouthpiece, and the device imitates the sound of a musical instrument whose pitch and volume change in response to the player""s voice.
The player is given the impression of playing the actual instrument and controlling it intimately with the fine nuances of his voice. The instrument can in principle be any music-producing sound source: a trumpet, trombone, saxophone, oboe, bassoon, clarinet, flute, piano, electric guitar, voice, whistle, i.e. virtually any source of sound.
The Reference Patent Application describes three primary software components of the Vocolo: the frequency-detection module, the loudness-tracking module, and the note-attack module. The frequency-detection module (FDM) identifies the frequency of the player""s voice. The chosen instrument is synthesized at the pitch determined by the FDM or at an offset from that pitch as desired by the player. The loudness-tracking component measures the loudness of the player""s voice, and this information is used then to set the volume of the synthesized sound. The note-attack module detects abrupt changes in the loudness of the player""s voice, which helps decide when the synthesized instrument should begin a new note.
One aspect of the present invention sets forth a refinement of the Vocolo hardware in the form of improved microphone interfaces. Alternative embodiments are also set forth, which comprise an electric drum for feeding back automatic background rhythm to the player, and a wiggle bar for expression control. Also disclosed are a smoother form of pitch discretization and a novel approach for mitigating pitch detection errors in the synthesis. Software methods for performance evaluation, sequence recording and playback, pitch smoothing, and novel use of the voice for expressive control, are also set forth.