The Mixed Excitation Linear Prediction model (MELP) was developed by the U.S. government's DOD Digital Voice Processing Consortium (DDVPC)(Supplee, Lynn M., Cohn, Ronald P., Collura, John S., McCree, Alan V., “MELP:The New Federal Standard at 2400 bps”, IEEE ICASSP-97 Conference, Munich Germany, the context of which is herein incorporated by reference) as the next standard for narrow band secure voice coding. The new speech model represents a dramatic improvement in speech quality and intelligibility at the 2.4 Kbps data rate. The algorithm performs well in harsh acoustic noise such as HMMWV's, helicopters and tanks. The buzzy sounding speech of the existing LPC10e speech model has been reduced to an acceptable level. The MELP model represents the next generation of speech processing in bandwidth constrained channels.
The MELP model as defined in MIL-STD-3005 is based on the traditional LPC10e parametric model, but also includes five additional features. These are mixed-excitation, aperiodic pulses, pulse dispersion, adaptive spectral enhancement, and Fourier magnitudes scaling of the voiced excitation.
The mixed-excitation is implemented using a five band-mixing model. The model can simulate frequency dependent voicing strengths using a fixed filter bank. The primary effect of this multi-band mixed excitation is to reduce the buzz usually associated with LPC10e vocoders. Speech is often a composite of both voiced and unvoiced signals. MELP performs a better approximation of the composite signal than LPC10e's Boolean voiced/unvoiced decision.
The MELP vocoder can synthesize voiced speech using either periodic or aperiodic pulses. Aperiodic pulses are most often used during transition regions between voiced and unvoiced segments of the speech signal. This feature allows the synthesizer to reproduce erratic glottal pulses without introducing tonal noise.
Pulse dispersion is implemented using a fixed pulse dispersion filter based on a spectrally flattened triangle pulse. The filter is implemented as a fixed finite impulse response (FIR) filter. The filter has the effect of spreading the excitation energy within a pitch period. The pulse dispersion filter aims to produce a better match between original and synthetic speech in regions without a formant by having the signal decay more slowly between pitch pulses. The filter reduces the harsh quality of the synthetic speech.
The adaptive spectral enhancement filter is based on the poles of the Linear Predictive Coding (LPC) vocal tract filter and is used to enhance the formant structure in synthetic speech. The filter improves the match between synthetic and natural band pass waveforms, and introduces a more natural quality to the output speech.
The first ten Fourier magnitudes are obtained by locating the peaks in the Fast Fourier Transform (FFT) of the LPC residual signal. The information embodied in these coefficients improves the accuracy of the speech production model at the perceptually important lower frequencies. The magnitudes are used to scale the voiced excitation to restore some of the energy lost in the 10th order LPC process. This increases the perceived quality of the coded speech, particularly for males and in the presence of background noise.
MELP parameters are transmitted via vector quantization. Vector quantization is the process of grouping source outputs together and encoding them as a single block. The block of source values can be viewed as a vector, hence the name vector quantization. The input source vector is then compared to a set of reference vectors called a codebook. The vector that minimizes some suitable distortion measure is selected as the quantized vector. The rate reduction occurs as the result of sending the codebook index instead of the quantized reference vector over the channel.
The vector quantization of speech parameters has been a widely studied topic in current research. At low rate transmission of quantized data, efficient quantization of the parameters using as few bits as possible is essential. Using suitable codebook structure, both the memory and computational complexity can be reduced. One attractive codebook structure is the use of a multi-stage codebook as described in “Vector Quantization and Signal Compression” (Gersho A., Gray R. M., Vector Quantization and Signal compression, Norwell, MA:Kluwer Academeic Publishers, 1991, the content of which is hereby incorporated by reference). The codebooks presented in this paper are designed using the generalized Lloyd algorithm to minimize average weighted mean-squared error using the TIMIT speech database as training vectors.
The generalized Lloyd algorithm consists of iteratively partitioning the training set into decisions regions for a given set of centroids. New centroids are the n re-optimized to minimize the distortion over a particular decision region. The generalized Lloyd algorithm is reproduced below from Y. Linde, A. Buzo, and R. M. Gray. “An algorithm for vector quantizer design.” IEEE Trans. Comm., COM-28:84-95, January 198, the content of which is hereby incorporated by reference.