Voice modeling has been an active field for some years for various purposes related to voice communications. Voice modeling typically decomposes voice signals into parameters which require less storage and bandwidth capacity for transmission. The resulting voice compression facilitates greater efficiency and more secure voice communications.
One of the voice models previously developed is referred to as the multiband excitation (MBE) model. This model takes discrete time intervals of a voice signal and separates each into the parameters of pitch, harmonic band energy, and the voiced or unvoiced state of each of the harmonic bands. Pitch is defined as the vibration frequency at the glottis, which excites the vocal tract to produce different sounds. The specific number of harmonic bands depends upon the pitch frequency and the bandwidth of the audible voice produced therefrom. Normally voice bandwidths up to 5 kHz are common because energy levels at higher frequencies drop off substantially. For this reason, voice models have typically been developed which function for bandwidths between 3.6 to 5.0 kHz. The voiced and unvoiced state of each of the harmonic bands is defined as the respective presence or absence of structured (non-Gausian) energy in the band. The voice model mentioned above is explained in greater detail in "Multiband Excitation Vocoder", Ph.D. Dissertation, Daniel W. Griffin, M.I.T. 1987.
Another area of interest for the application of voice modeling is that of synthesizing a person's voice via the transformation of the voice of a first talker into the voice of a different person or second talker. Whereas the MBE model compresses the amount of information required for the storage and transmission of voice signals, voice transformation takes the parameters estimated from one person's voice and transforms them to have the characteristics of another person's voice. The intent is to make the first talker sound like the second talker.