One of the newest forms of entertainment to become popular in Japan and the United States is karaoke. A karaoke machine typically comprises a stereo sound system and a large video monitor or television screen. A videotape or videodisc player is coupled to the video monitor to simultaneously play a music video while a musical song that lacks a vocal track is played on the stereo system. As the music video is played on the video monitor, the words of the song are displayed at the same time as they are to be sung. A microphone is also coupled to the stereo system so that a participant can sing the words of the song being played as the music video is shown.
Not surprisingly, the quality of such impromptu singing performances varies greatly depending on the singing ability of the participant. As a result, many people are hesitant to stand up and sing in front of a crowd of friends and/or hecklers. This hesitation is usually due to a perceived lack of talent on the part of the "would be participant." However, some people, despite words of encouragement, are not blessed with the ability to remain on pitch with a musical accompaniment being played. Therefore, a need exists for an entertainment system that can alter the pitch of the notes sung by a participant to correspond to the proper pitch of the song being played.
Prior to the present invention, inexpensive equipment has not been available to alter the pitch of a vocal signal in a way that sounds natural. While musical pitch shifters that can alter the pitch of a signal produced by a musical instrument such as a guitar or synthesizer have been well known for many years, such devices do not work well on vocal sounds.
In any periodic musical signal, there is always a fundamental frequency that determines the particular pitch of the signal as well as numerous harmonics, which give character to the musical note. It is the particular combination of the harmonic frequencies with the fundamental frequency that make, for example, a guitar and a violin playing the same note sound different from one another. In a musical instrument such as a guitar, flute, saxophone or a keyboard, as the notes played by the instrument vary, the spectral envelope containing the fundamental frequency and the harmonics expands or contracts correspondingly. Therefore, for musical instruments one can alter the pitch of a note by sampling sound from the instrument and playing the sampled sound back at a rate either faster or slower, without the pitch-shifted notes sounding artificial. Although this method works well to shift the pitch of a note from a musical instrument, it does not work well for shifting the pitch of a vocal signal or sung note.
In a vocal signal, there is typically a fundamental frequency that determines the pitch of a note an individual is singing, as well as a set of harmonic frequencies that add character and timbre to the note. In contrast with a musical instrument, as the pitch of a vocal signal varies, the spectral envelope of the harmonics retains the same shape but the individual frequency components that make up the spectral envelope may change in magnitude. Therefore, shifting the pitch of a vocal signal by sampling a note as it is sung and by playing back the sampled signal at a rate that is either faster or slower does not sound natural, because that method varies the shape of the spectral envelope. In order to alter the pitch of a vocal note in a way that sounds natural, a method is required for varying the frequency of the fundamental, while maintaining the overall shape of the spectral envelope.
The inventors have found that the method, as set forth in the article by K. Lent, "An Efficient Method for Pitch Shifting Digitally Sampled Sounds," Computer Music Journal, Volume 13, No. 4, Winter, pp. 65-71 (1989) (hereafter referred to as the Lent method), is particularly suited for use in shifting the pitch of a vocal signal because the method maintains the shape of the spectral envelope. However, the actual implementation of the Lent method, as set forth in the referenced paper, is computationally complex and difficult to implement in real time with inexpensive computing equipment. Additionally, the Lent method requires that the fundamental frequency of a signal be known exactly. Unfortunately, this is a problem because vocal signals are difficult to analyze. More specifically, because the fundamental frequency of a given note when sung may vary considerably, it is difficult for a pitch shifter to accurately determine the fundamental frequency. The Lent method does not address the problem of accurately determining the fundamental frequency of a complex vocal signal.
Therefore, there exists a need for a method and apparatus for shifting the pitch of a vocal signal that can operate substantially in real time and be implemented with inexpensive computing equipment. This method and apparatus should be able to quickly analyze an input vocal signal and compare it to a Reference Note that corresponds to the "correct" pitch of the song being played. The method and apparatus should then shift the pitch of the input vocal signal so that it is on pitch with the Reference Note in a way that sounds natural.