1. Field of the Invention
The present invention is directed to a system for processing human speech and, more particularly, to a system that pitch-synchronously segments the human speech waveform into individual pitch waveforms which may be transformed, replicated, and concatenated to generate continuous speech with desired speech characteristics.
2. Description of the Related Art
The ability to alter speech characteristics is important in both military and civilian applications with the increased use of synthesized speech in communication terminals, message devices, virtual-reality environments, and training aids. Currently, however, there is no known method capable of modifying utterance rate, pitch period, or resonant frequencies of speech by operating directly on the original speech waveform.
Typical speech analysis and synthesis are based on a model that includes a vocal tract component consisting of an electrical filter and a glottis component consisting of an excitation signal which is usually an electrical signal generator feeding the filter. A goal of these models is to convert the complex speech waveform into a set of perceptually significant parameters. By controlling these parameters, speech can be generated with these models. To derive human speech model parameters accurately, both the model input (turbulent air from the lungs) and the model output (speech waveform) are required. In conventional speech models, however, model parameters are derived using only the model output because the model input is not accessible. As a result, the estimated model parameters are not often accurate.
What is needed is a different way of representing speech that does not represent speech as an electrical analog sound production mechanism.