1. Field of the Invention
This invention relates generally to electronic audio apparatus and in particular to apparatus and methods that determine the pitch of a musical note produced by voice or instrument which shift the pitch of that note toward a standard pitch.
2. Description of the Prior Art
Pitch is a quality of sound relating to the frequencies of the energy involved. Some sounds are very complex and don't involve energy of specific frequencies. A vocalist and the majority of individual instruments have the most clearly defined quality of pitch. The sound-generating mechanism of these sources is a vibrating element (vocal chords, a string, an air column, etc.). The sound that is generated consists of energy at a frequency (called the fundamental) and energy at frequencies that are integer multiples of the fundamental frequency (called harmonics). These sounds have a waveform (pressure as a function of time) that is periodic.
Voices or instruments are out of tune when their pitch is not sufficiently close to standard pitches expected by the listener, given the harmonic fabric and genre of the ensemble. When voices or instruments are out of tune, the emotional qualities of the performance are lost. Correcting intonation, that is, measuring the actual pitch of a note and changing the measured pitch to a standard, solves this problem and restores the performance.
The purpose of the invention is to correct intonation errors of vocals or other soloists in real time in studio and performance conditions. The invention is incorporated or embodied in an apparatus which can also introduce vibrato into the note, if desired by the user. The solo input sound is processed by the apparatus and by the method of the invention to produce an output of that same sound, except that it is in tune, possibly with vibrato added. The apparatus of the invention changes the instantaneous pitch and introduces no distortion in the output according to the method of the invention.
Determining the pitch of a sound is equivalent to determining the period of repetition of the waveform. A commonly used reference point for determining the period of the waveform is its zero crossings. Zero crossings are used, for example, in the electronic tuning aid disclosed by Hollimon (U.S. Pat. No. 4,523,506). For a simple sine wave, the period is easily determined as the time interval between successive zero crossings of the signal. More complex signals, however, can render the zero crossing approach as unsuitable for period detection, because multiple zero-crossings can occur.
Another common method of determining the period of the waveform is by using a peak detector circuit responsive to the time interval between peaks of the signal. Peak detection is used in the disclosure of Mercer (U.S. Pat. No. 4,273,023). As with zero crossing techniques, peak detection works well with a simple signal, such as a sine wave. When more complex signals are involved the accuracy of peak detection suffers, because multiple peaks of similar amplitude may occur.
To overcome some of the problems of determining pitch encountered by zero crossing and peak detection techniques, methods have been developed using the portion of the signal that crosses a set threshold as the reference point for determining the period. For example, in the method and apparatus disclosed by Slepian et al. (U.S. Pat. No. 4,217,808), an automatic gain control device adjusts the positive and negative excursions of the signal to selected levels. Positive and negative thresholds are then established, equal to a percentage of the maximum excursion levels. The period is essentially defined as the time between a first upward crossing of the positive threshold by the signal and a second upward crossing of the positive threshold, separated in time by a downward crossing of the negative threshold. Establishing a threshold includes no provision for ensuring that the reference point will correspond to high-slope regions of the signal. Thus, the signal may be relatively low in slope at the threshold crossing, making the exact time of occurrence difficult to determine.
Because the timing of the reference points used to determine the period of the signal may be difficult to precisely determine, another technique employs the computation of an average period from a plurality of period measurements over a longer period of time as a way of improving accuracy. For example, the note analyzer disclosed by Moravec et al. (U.S. Pat. No. 4,354,418), establishes separate period data counts for a number of cycles of the signal and outputs a period that is an average of the period data counts produced. This system requires a stable pitch over a large number of periods to accurately determine pitch. This situation is not typical of conditions needed for intonation correction, because the input pitch is not sufficiently stable.
Instead of using many period measurements over a long period of time, Gibson et al. (U.S. Pat. No. 4,688,464) adds more redundancy by making multiple estimates within a few cycles. The complexity of the measurements used by the Gibson method require many checks and balances to insure that false alarms and incorrectly identified pitches do not occur. In practice, this technique fails, yielding artifacts in the output.
All prior art techniques for determining the period of a waveform have a common failing: They all seek to determine some characteristic attribute(s) of the waveform and then determine the period of repetition of that attribute. All of these techniques eventually fail for the same reason: Noise in the waveform corrupts the computations or the waveform gradually changes shape, causing tracking to be lost, because an attribute being tracked is removed from the data.
Assuming that the input pitch is measured or is known, an apparatus and method can be provided to determine a pitch change from a standard and retune the input to that standard. A number of pitch shifting devices and techniques exist in the musical industry to do that. All of the prior methods are inadequate in achieving a high quality intonation correction. These techniques can be classified in two domains. The first are frequency domain methods which use Fast Fourier Transform (FFT) overlap-and-save algorithms. The second are time domain algorithms used by sampling synthesizers and harmony generators.
The FFT overlap-and-save algorithms are not high quality algorithms for pitch shifting for two reasons. First, they process sequences of data. Better quality pitch shifting occurs when longer sequences are used. The entire sequence can be shifted only by a constant pitch change. However, high quality intonation correction requires continuous changes in pitch. Hence there is a trade-off between continuity of shifting and sequence length. Second, input data windowing and subsequent window overlap cross-fade computations are non-ideal operations that introduce distortions in the output.
Existing time domain methods for pitch shifting in harmony generators work around the limitation of imprecise knowledge of the current period of the data. The method set forth in the article, Lent, K., "An Efficient Method for Pitch Shifting Digitally Sampled Sounds," Computer Music Journal, Volume 13, No.4, Winter, pp.65-71 (1989) (hereafter referred to as the Lent method) is a basic method used to resample data and maintain the shape of the spectral envelope. This method windows sections of the input data with windows one period in length and then recombines these windows with spacing of the new sampling period. The data is not resampled. Hence, a new fundamental period is defined, giving a perception of a new pitch. However, the amplitude spectra is augmented, resulting in unnatural sounds.
The windowing and window merging technique of the Lent method circumvents imprecise knowledge of the period of the data. Gibson et al. (U.S. Pat. No. 5,231,671) used the Lent method for pitch shifting in a harmony generator. Gibson uses a note's auto-correlation function as a one time--check for octave errors in initial estimates of the waveform period. Later, Gibson et al. (U.S. Pat. No. 5,567,901) added a data re-sampling step before the recombination step. However, this resampling does not completely compensate for shortcomings of the Lent method and is used more as a qualitative adjustment.
Sample based synthesizers adjust the pitch of output by resampling or changing the sample rate of data being played back from the memory of the device. Although a large number of techniques are used by these devices, none of them relate to this problem. Conceptually, these devices, using a technique called looping, store an infinitely long sequence of samples that are played back as output.