The Mixed Signals Products group of Texas Instruments Semiconductor Division (SC/MSP) has an LPC (Linear Predicting Coding) synthesis semiconductor chip business with its family of TSP50C1X and MSP50C3X microprocessors. The synthesis is where a signal such as a human voice or sound effect such as animal or bird sound to be synthesized is first analyzed using a linear predictive coding analysis to extract spectral, pitch, voicing and gain parameters. This analysis is done using a Speech Development Station 10 as shown in FIG. 1 which is a workstation with a Texas Instruments SDS5000. The SDS5000 consist of two circuit boards 10a plugged into two side by side slots of a personal computer (PC). The PC includes a CPU processor and a display and inputs 10b such as a keyboard, a mouse, a CD ROM drive and a floppy disk drive. Using one of the inputs like a CD ROM, the voice or sound to be synthesized is entered for analysis. The station also includes a speaker 10c coupled to the PC and the user editing can listen to the sound as well as view the display generated by the SDS5000. The analysis is typically done at a rate of 50-100 times per second. The display gives a time plot of the raw speech spectrum, pitch, energy level and LPC filter coefficients. These parameters may then be edited, if necessary, and quantized to a data rate of typically 1500-2400 bits/second. The data rate is kept low to reduce the memory needed to store the data in the product being created. The foregoing analysis is performed off-line and the LPC parameters are stored into the memory M of a synthesis product such as a talking toy or book 15 shown in FIG. 2. The book for example contains a microprocessor .mu.P 17 that is coupled to a ROM memory M 19 that when a button 20 is pressed processes using LPC model data to produce the sound to a speaker S. The digital signal is converted to analog signal and applied to a speaker in the book or toy. The coefficients for that sound corresponding to the button depressed are taken from the memory.
In many applications, it is desirable to synthesize not only speech, but also sound effects or musical instrument sounds as well. Some interments can be modeled fairly well using the pitch-excited LPC model above, since heir spectra consist of harmonically-related partials shaped by a spectral envelope. However percussion sounds, i.e. sounds created by striking or plucking a string or other object, often do not fit this model. The modes of vibration or partials (frequency components) created by striking a xylophone bar, for example, are related to the physical dimensions of the bar itself. This means that the modes are, in general, not related to each other by an integer multiple of some fundamental frequency. The pitch-excited LPC model is incapable of producing aharmonic tones, thus it is not well-suited to synthesizing such sounds.
The physical behavior of struck objects suggests that they can be modeled by a sum of sinusoids with exponentially decaying amplitudes. See A. H. Benade, Fundamentals of Musical Acoustics, Dover Publications, Inc. 1990. Examples of other work in this area include J. Laroche and J. L. Meillier, "Multichannel excitation/filter modeling of percussive sounds with application to the piano," IEEE Transactions on Speech and Audio Processing, Vol. 2, pp. 329-344, April 1994 in which a high order excitation/filter model is used to represent piano tones, and J. Laroche, "A new analysis/synthesis system of musical signals using Prony's method: Application to heavily damped percussive sounds," in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pp. 2053-2056, IEEE, April 1989, in which percussion sounds are created by explicit synthesis of time-varying exponentials.
One straightforward approach is to perform LPC analysis on the signal to be synthesized. The reflection coefficients must be hand-edited to obtain good synthesized output. However, even with fine tuning, LPC analysis often does not give satisfactory results. This is due to the fact that the LPC model is only good for human vocal tract, but not good for musical instruments.
Another way to generate musical notes in the synthesizer chip is to use the PCM mode, in which a sampled waveform is loaded directly into the D/A converter. This produces very high quality output but requires a large amount of memory for storing the samples. An alternative method is to generate sine waves at different frequencies for various tones. In this case, only one period of each sine wave needs to be stored and this reduces the data rate significantly. However, a drawback of this approach is that the output is very synthetic and does not sound like any musical instrument due to the lack of harmonics.
The TSP50C1X and MSP50C3X chips implement an all-pole lattice filter to which can be input a periodic pulse train, pseudo-random noise, or an excitation sequence stored in memory 19.
The LPC method models short-time segments of the speech signal as the response of an all-pole filter to an impulse input. A frame-by-frame analysis of 20-30 ms duration windowed segments is often used, and the filter parameters are updated in time and interpolated during the synthesis process. For a review of LPC, see J. Makhoul's article entitled, "Linear Prediction: A Tutorial Review," Proc. of IEEE, Vol. 63, pp. 561-580, April 1975.