This invention relates generally to the field of signal processing, and more particularly, to a method and apparatus for pitch-shifting an information signal.
Pitch-shifting is the operation whereby the pitch of a signal (music, speech, audio or other information signal), is altered while its duration remains unchanged. Pitch shifting may be used in audio processing, such as in music synthesis, where the original pitch of musical sounds of a known duration may be shifted to form higher or lower pitched sounds of the same duration. For example, pitch-shifting can be used to transpose a song between keys or to change the sound of a person""s voice to achieve a desired special effect.
Typically, use of a phase-vocoder has always been a highly praised technique for time-scale modification of speech and audio signals. This is because the resulting signal is usually free of artifacts typically encountered in other time domain techniques. The standard way to carry out pitch-shifting using the phase-vocoder is to first perform a time-scale modification, then perform a time-domain sample rate conversion to obtain the resulting signal. For example, in order to raise the pitch of a signal by a factor of two while keeping its duration unchanged, one would use the phase-vocoder to time-expand the signal by a factor of two, leaving the pitch unchanged, and then down-sample the resulting signal by a factor of two, thereby restoring the original duration.
Unfortunately, using a phase-vocoder to perform pitch-shifting has several undesirable drawbacks. One drawback is that the processing cost per output sample is a function of the pitch modification factor. For example, if the modification factor is large, the number of mathematical operations increases correspondingly. The mathematical operations may also require complex functions, such as computing arctangents or phase unwrapping. Another drawback is that only one xe2x80x98linearxe2x80x99 pitch-shift modification can be performed at a time. This is true because the frequencies of all the components are multiplied by the same modification factor. As a result, more complex processes, like signal harmonizing or chorusing, cannot be implemented in one pass and therefore have high processing costs.
Given the limitations of the phase-vocoder, it is desirable to have a system that can perform processes like pitch-shifting in a computationally efficient manner. Such a system should also be capable of performing a variety of linear and non-linear pitch-shifting functions in a single pass. In doing so, special effects such as harmonizing and chorusing could be efficiently and easily implemented.
One aspect of the present invention solves the problems associated with pitch-shifting by providing a system for pitch-shifting signals in the frequency domain. This eliminates the expensive time domain resampling stage and allows the computational costs to become independent of the pitch modification factor. Unlike the prior art, the system does not require the calculation of arctangents nor phase unwrapping when modifying the phase in the frequency domain, thus achieving a significant reduction in the number of computations. For example, in one embodiment, the system supports a 50% overlap (as opposed to a 75% overlap in standard implementations), which cuts the computational cost by a factor of 2.
In an embodiment of the invention, a method is provided for pitch-shifting a signal by converting the signal to a frequency domain representation and then identifying a region in the frequency domain representation. The region being located at a first frequency location. Next, the region is shifted to a second frequency location to form a adjusted frequency domain representation. Finally, the adjusted frequency domain representation is transformed to a time domain signal representing the input signal with shifted pitch.