The present invention relates to signal processing and, more particularly, to prosody modification of a quasi-periodic signal.
Prosody modification is the adjustment of a quasi-periodic signal without affecting the timbre. Quasi-periodic signals include human speech, e.g., talking and singing, synthetic speech, and sounds from musical instruments, such as notes from woodwind, brass, or stringed instruments. Specific examples of prosody modification include adjusting the pitch of a quasi-periodic signal without affecting the timbre, for example, changing a sampled clarinet note from a C to a B while still sounding like a clarinet. Another purpose of prosody modification is to change the duration of a quasi-periodic signal without affecting either the pitch or the timbre.
Practical applications of prosody modification include adding emphasis to portions of a pre-recorded message and changing the duration of human dialog to fit a particular time slot, e.g., an advertising announcement or lip-syncing during postproduction of a movie or video. Prosody modification is also used to adjust the pitch of a singer or musical instrument, for example, to change the musical key, add vibrato, or correct for poor voice control. Speech synthesis requires prosody modification of short speech segments before concatenation to create words and longer messages.
One conventional approach to prosody modification is a pitch-synchronous overlap-and-add technique. U.S. Pat. No. 5,524,172 describes a conventional overlap-and-add system for modifying the prosody of speech synthesis segments, which are derived from human sounds sampled at a relatively low sampling rate of 16 kHz due to tight constraints in computation and storage costs. A series of original synchronization marks within the speech segment are indexed by sample number and saved in a memory. The duration of the speech segments is modified by time-warping the synchronization marks to produce a series of synthetic synchronization marks, also indexed by a sample number. Waveforms are extracted from the speech segment at the original synchronization mark using a symmetrical Hanning window, overlapped by shifting to the corresponding synthetic synchronization mark, and added to the output signal.
Conventional overlap-and-add techniques introduce some noise in the form of artificial jitter or harmonic mix-up, into the signal, which is heard as a xe2x80x9cfuzzinessxe2x80x9d or a reedy quality. In particular, higher pitched signals, such as women""s voices, children""s voice, singing voices, and most musical instrument notes, are especially affected. Moreover, conventional overlap-and-add systems have difficulty with signals involving rapid changes in pitch, for example, during music such as signing or playing musical instruments.
There exists a need for a prosody modification system and methodology that reduces the introduction of noise or fuzziness in its outputs. There is also a need for effectively modifying the prosody of signals without severely affecting the musicality or compromising the desired pitch, for example, in higher-pitched signals, such as women""s voices, children""s voice, singing voices, and most musical instrument notes and signals involving rapid changes in pitch.
One aspect of the present invention stems from the realization that an important source of errors in the output signal of conventional overlap-and-add systems is due to the rounding synchronization of the waveforms to intervals defined by the relatively low sampling rate. However, it is not desirable to increase the sampling rate owing to the tight computational and storage constraints.
Accordingly, one aspect of the present invention is a method and computer-readable medium bearing instructions for performing a prosody modification on a quasi-periodic signal, sampled at a sampling interval. A series of original synchronization marks is determined for the quasi-periodic signal, from which a series of synthetic synchronization marks are determined in accordance with the prosodic modification. Waveforms are extracted from the quasi-periodic signal around one of the original synchronization marks, and shifted to one of the synthetic synchronization marks corresponding to the original synchronization marks. The difference of the original synchronization mark and the synthetic synchronization mark is not an integral multiple of said sampling interval. One implementation of non-integral shifting is by resampling the quasi-periodic signal. The prosody-modified signal is then generated based on the shifted waveforms, for example, by overlap-and-add techniques.
Another aspect of the present invention stems from the realization that another source of errors in conventional overlap-and-add techniques is the use of symmetric windows in extracting waveforms around synchronization marks when the pitch is rapidly changing. The symmetric windows tend to either extract too little or too much of the waveform to be overlapped-and-added.
Accordingly, a method and computer-readable medium bearing instructions are provided for synthesizing a quasi-periodic signal from an original signal. A series of original synchronization marks is determined for the quasi-periodic signal, from which a series of synthetic synchronization marks are determined in accordance with the prosodic modification. Waveforms are extracted from around one of-the original synchronization marks by applying an asymmetric filtering window and time-shifting the waveforms according to the original synchronization mark and a corresponding synthetic synchronization marks. The extracted, shifted waveforms are summed to synthesize the quasi-periodic signal. The filtering window may be defined as having a first half-width on one side of the original synchronization mark and a second half-width on another side of the original synchronization mark, in which the first half-width is different from the second half-width. In some implementations, the filtering window comprises two half-Hanning windows.
Additional needs, objects, advantages, and novel features of the present invention will be set forth in part in the description that follows, and in part, will become apparent upon examination or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.