1. Technical Field
The invention relates to musical instruments. More particularly, the invention relates to a voice-controlled electronic musical instrument.
2. Description of the Prior Art
Musical instruments have traditionally been difficult to play, thus requiring a significant investment of time and, in some cases money, to learn the basic operating skills of that instrument. In addition to frequent and often arduous practice sessions, music lessons would typically be required, teaching the mechanical skills to achieve the proper musical expression associated with that instrument, such as pitch, loudness, and timbre. In addition, a musical language would be taught so that the user would be able to operate the instrument to play previously written songs.
The evolution of musical instruments has been relatively slow, with few new musical-instrument products taking hold over the past several hundred years. The introduction of electronics-related technology, however, has had a significant impact on musical-instrument product development. The music synthesizer, for example, together with the piano keyboard interface/controller, has vastly expanded the number and variety of instrument sounds which can be produced by a person who has learned to play a single instrumentxe2x80x94that of piano or keyboards. The requirement remained, however, that for someone to operate a synthesizer, that person would have to learn at least some of the fundamentals of music expression associated with playing a piano.
Therefore, for those people who wanted to be able to express themselves musically, but had not learned to play an instrument, or wanted to be able to make many instrument sounds without learning how to play each instrument, there was still a significant time investment required to learn the skill, with no assurance that they could ever reach a level of proficiency acceptable to them.
In U.S. Pat. Nos. 3,484,530 and 3,634,596, there are disclosed systems for producing musical outputs from a memory containing recorded musical notes that can be stimulated by single note inputs through a microphone. The systems disclosed in these patents are reportedly able to detect pitch, attack, sustain, and decay as well as volume level and are able to apply these sensed inputs to the recorded note being played back. In effect, the systems are musical note to musical note converters that may be converted fast enough so that no lag can be detected by the listener or by the player. However, to achieve these capabilities, rather cumbersome and expensive electronic and mechanical means were suggested, which are not suited for portable or handheld instruments, but primarily intended for larger systems.
In the systems disclosed in the above patents, the memory is capable of containing discrete notes of the chromatic scale and respond to discrete input notes of the same pitch. The system is analogous to a keyboard instrument where the player has only discrete notes to choose from and actuates one by depressing that particular key. Other musical instruments give a player a choice of pitches between whole and half tone increments. For example, a violin can produce a pitch which is variable depending upon where the string is fretted or a slide trombone can cause a pitch falling in between whole and half tone increments. Both of these instruments produce an unbroken frequency spectrum of pitch. However, such prior art systems are not able to provide a continually varying pitch at the output in response to a continually varying pitch at the input, nor have they been able to produce a note timbre that realistically duplicates what a real instrument does as a function of pitch over the range of the instrument nor provide a note quality or timbre which realistically duplicates what a real instrument does as a function of degree of force at the input of an instrument.
A variety of other methods have been proposed to use the human voice to control a synthesizer, thus taking advantage of the singular musical expression mechanism which most people have. Virtually anyone who can speak has the ability to change musically expressive parameters such as pitch and loudness. One such method is described in R. Rupert, U.S. Pat. No. 4,463,650 (Aug. 7, 1984). In the Rupert device, real instrumental notes are contained in a memory with the system responsive to the stimuli of, what he refers to as xe2x80x98mouth musicxe2x80x99 to create playable musical instruments that responds to the mouth music stimuli in real time. See, also, K. Obata, Input apparatus of electronic device for extracting pitch from input waveform signal, U.S. Patent No. 4,924,746 (May 15, 1990).
Ishikawa, Sakata, Obara, Voice Recognition Interval Scoring System, European Pat. No. 142,935 (May 29, 1985), recognizing the inaccuracies of the singing voice xe2x80x9ccontemplates providing correcting means for easily correcting interval data scored and to correct the interval in a correcting mode by shifting cursors at portions to be corrected.xe2x80x9d In a similar attempt to deal with vocal inaccuracies, a device described by M. Tsunoo et al, U.S. Pat. No. 3,999,456 (Dec. 28, 1976) uses a voice keying system for a voice-controlled musical instrument which limits the output tone to a musical scale. The difficulty in employing either the Ishikawa or the Tsunoo devices for useful purposes is that most untrained musicians do not know which scales are appropriate for different songs and applications. The device may even be a detractor from the unimproved voice-controlled music synthesizer, due to the frustration of the user not being able to reach certain notes he desires to play.
In a related area, the concept of xe2x80x9cmusic-minus-onexe2x80x9d is the use of a predefined usually prerecorded musical background to supply contextual music around which a musician/user sings or plays an instrument, usually the lead part. This concept allows the user to make fuller sounding music, by playing a key part, but having the other parts played by other musicians. Benefits to such an experience include greater entertainment value, practice value and an outlet for creative expression.
M. Hoff, Entertainment and creative expression device for easily playing along to background music, U.S. Pat. No. 4,771,671 (Sep. 20, 1988) discloses an enhancement to the music minus-one concept, providing a degree of intelligence to the musical instrument playing the lead the voice-controlled music synthesizer, in this case so as not to produce a note which sounds dissonant or discordant relative to the background music. In addition, Hoff discloses a variation on the voice-controlled music synthesizer by employing correction. Rather than correcting the interval in an arbitrary manner, as suggested in the Tsunoo and Ishikawa patents, this device adjusts the output of the music synthesizer to one which necessarily sounds good to the average listener, relative to predefined background music. However, Hoff performs pitch correction only in the context of pre-programmed accompaniments, using the scale note suggested by the accompaniment nearest to the detected pitch. Hoff does not provide pitch correction in the absence of accompaniment, for example, the capability for the user to choose the scale to be used for the pitch correction or the capability to assign the currently detected pitch to the tonic of that scale.
Various approaches to the process of pitch detection itself are known. For example, see M. Russ, Sound Synthesis and Sampling, Focal Press, 1996, p. 265, or L. Rabiner et. al., A Comparative Performance Study of Several Pitch Detection Algorithms, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-24, No. 5, October 1976, p. 399. According to Russ, the traditional general classifications for pitch detection are a) zero-crossing, b) auto-correlation, c) spectral interpretation. Two auto-correlation approaches that bear some resemblance to the present approach are example, S. Dame, Method and Device For Determining The Primary Pitch of A Music Signal, U.S. Pat. No. 5,619,004 (8 Apr. 1997) and M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, and H. J. Manley, Average Magnitude Difference Function Pitch Extractor, IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-22, No. 5 (October 1974).
A major drawback of all presently known systems that allow voice control of a musical instrument is that they require bulky enclosures and are presented in unfamiliar form factors, i.e. as imposing pieces of technical equipment. Thus, a user is unable to connect with such instruments in a natural way. Rather than playing a musical instrument, such devices give one the impression of operating a piece of machinery which, in most cases, is similar to operating a computer. This fact alone well explains the lack of commercial success and consumer acceptance these devices have found.
It would be advantageous to provide a voice-controlled musical instrument in a form factor that most nearly represents the actual instrument that the electronic instrument is to represent. It would be further advantageous if such form factor contributed to the ease of use of such instrument by providing a user with a simple method of operation. It would also be advantageous to provide a computationally efficient pitch detection technique for a voice-controlled electronic musical instrument, such that a reduced size form factor could be achieved.
The invention provides a voice-controlled musical instrument in a form factor that most nearly represents the actual instrument that the electronic instrument is to represent. Such form factor contributes to the ease of use of such instrument by providing a user with a simple method of operation. The invention also provides a computationally efficient pitch-detection technique for a voice-controlled electronic musical instrument.
The device described in this document is an electronic, voice-controlled musical instrument. It is in essence an electronic kazoo. The player hums into the mouthpiece, and the device imitates the sound of a musical instrument whose pitch and volume change in response to the player""s voice.
The player is given the impression of playing the actual instrument and controlling it intimately with the fine nuances of his voice. Significantly, the device is compact, self contained, and operated by the user with a simple set of controls. In such way, the invention overcomes many of the barriers to acceptance of such electronic instruments as were taught in the prior art. That is, the device is simple to operate and to hold while playing. Because the device is self contained, lightweight, and fully integrated, there are no exposed wires or connections to make between various components of a system which would detract from both the enjoyment of the device and the sense that the device is an electronic surrogate for the actual instrument that it physically represents. Because the device is provided in a dedicated form, e.g. as a horn, the user is drawn into the musical experience rather than distracted by the use of a microphone. Thus, voice operation of the device most nearly implies playing the actual instrument the device represents and creates the impression that the user is actually playing an instrument. Further, by taking the counterintuitive measure of severely restricting the user""s ability to alter operation of the device, the user interface is significantly simplified. This again imposes the form and operation of the actual instrument onto the device, such that the user may feel as though he is playing the instrument, even though he may not have the musical skill to operate the actual instrument. Because the device uses a unique pitch-detection scheme that is both computationally efficient and well suited for an integrated device, such as the voice-controlled electronic musical instrument herein disclosed, it is possible to provide both a compact, self-contained device and, significantly, a device that provides a high degree of musicality, thereby further enhancing the impression that the user is actually playing a musical instrument.
The instrument can in principle be any music-producing sound source: a trumpet, trombone, saxophone, oboe, bassoon, clarinet, flute, piano, electric guitar, voice, whistle, i.e. virtually any source of sound.
In its simplest configuration, the instrument resembles a kind of horn, and is for convenience called the HumHorn throughout this document. However, the shape and appearance of the instrument can be fashioned by the manufacturer to match the sound of any traditional instrument, if desired; or its shape can be completely novel. The functional requirements of the HumHorn""s physical design are only:
That it be hand-held;
That it have a mouthpiecexe2x80x94where the player""s voice enters;
That it have one or more speakersxe2x80x94where the sound is produced; and
That it have a bodyxe2x80x94where the electronics and batteries are stored and where finger-actuated controls can be placed.
Three primary software components of the HumHorn are the frequency-detection module, the loudness-tracking module, and the note-attack module.
The frequency-detection module (FDM) identifies the frequency of the player""s voice. It does this by analyzing the incoming sound wave and finding patterns of recurring shapes. This method is a highly computationally efficient and novel combination of auto-correlation and zero-crossing- or peak-based pitch detection. The chosen instrument is synthesized at the pitch determined by the FDM or at an offset from that pitch as desired by the player.
The loudness-tracking component measures the loudness of the player""s voice, and this information is used then to set the volume of the synthesized sound.
The note-attack module detects abrupt changes in the loudness of the player""s voice. This component helps decide when the synthesized instrument should begin a new note.