One conventional method of varying the pitch of voiced sounds in artificial speech involves deleting samples in the low-energy portion of pitch period waveforms, or inserting extra samples within or at the end of the waveform, to respectively shorten or lengthen the pitch periods.
This method is limited in its applicability because, in order to minimize the distortion of the pitch period's spectral characteristics, the deletion (truncation) or insertion (extension) must be made at "quiet" points in the pitch period waveform, i.e. points at which very little or no fundamental-frequency and lower harmonic energy is present in the waveform, and energy is present at most in the form of a low ripple. In a male voice, there are usually enough such points to accommodate substantial pitch variations, but in a female voice much less leeway exists in this respect. This is so because the female voice has many more pitch periods, each of which is much smaller (typically 100 samples vs. 250); consequently, any change in a pitch period has a much more drastic effect. In any event, truncation or extension does change the spectral characteristics (i.e. the sum-total of the fundamental frequency and its harmonics that make up the pitch period waveform), and therefore introduces distortion if used to excess.
Another method of varying the pitch involves changing the dialout rate of the waveform samples. This method again shortens or lengthens the time duration of the pitch periods, but although it merely shifts all the component frequencies of the waveform equally, the shift results in an unnatural-sounding, "Mickey Mouse"-like speech quality.
A pitch change in excess of about 20% by the former method or 10% by the latter method results in an unacceptable deterioration of speech quality; yet natural pitch variations due to prosody in real speech can be on the order of 40% in each direction from a norm.