Accurate representations of speech have been demonstrated using harmonic models where a sum of sinusoids is used for synthesis. An analyzer partitions speech into overlapping frames, Hamming windows each frame, constructs a magnitude/phase spectrum, and locates individual sinusoids. The correct magnitude, phase, and frequency of the sinusoids are then transmitted to a synthesizer which generates the synthetic speech. In an unquantized harmonic speech coding system, the resulting speech quality is virtually transparent in that most people cannot distinguish the original from the synthetic. The difficulty in applying this approach at low bit rates lies in the necessity of coding up to 80 harmonics. (The sinusoids are referred to herein as harmonics, although they are not always harmonically related.) Bit rates below 9.6 kilobits/second are typically achieved by incorporating pitch and voicing or by dropping some or all of the phase information. The result is synthetic speech differing in quality and robustness from the unquantized version.
One prior art quantized harmonic speech coding arrangement is disclosed in R. J. McAulay and T. F. Quatieri, "Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps," Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., vol. 3, pp. 1645-1648, April 1987. Parameters are determined at an analyzer to model the speech and each parameter is quantized by chosing the closest one of a number of discrete values that the parameter can take on. This procedure is referred to as scalar quantization since only individual parameters are quantized. Although the McAulay arrangement generates synthetic speech of good quality, a need exists in the art for harmonic coding arrangements of improved speech quality.