The present invention relates to representations of the spectrum of a signal. In particular, the present invention relates to reducing the size of data words needed to describe the spectral content of a signal.
In speech recognition, the speech signal is typically divided into frames and each frame is converted into a set of values that describe the spectral energy of the frame. These spectral values are then used to decode the speech signal to produce a sequence of words.
At times, it is desirable to transmit the spectral values from one computer to another to allow for distributed recognition of the speech signal or to store the spectral values for later processing. One barrier to transmitting or storing these values is that for each frame there are often at least thirteen spectral values and each spectral value is represented by a sixteen bit word. This results in 26 bytes per frame. With a new frame being constructed every ten milliseconds, 2.6 kilobytes of information must be transmitted for every second of speech.
To reduce the amount of information that must be transmitted or stored, the prior art has used Vector Quantization in which each combination of spectral values that can be generated for a frame is represented by a codeword in a codebook. The index for the codeword is then transmitted or stored in place of the spectral values. At the receiver or when the index is retrieved for processing, the index is applied to a copy of the codebook to retrieve the codeword. The codeword is then used as the spectral vector.
Although Vector Quantization reduces the amount of data that must be transmitted or stored, it requires a large amount of memory to store all of the codewords. In fact, the codebook for the spectral values typically exceeds the amount of memory available on the computing device.
To overcome this, split-Vector Quantization has been used. In split-Vector Quantization, the spectral vector is divided into segments and a codeword is identified for each segment of the vector. For example, for a spectral vector of [C0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12], C0 would constitute one segment, [C1,C2,C3,C4,C5,C6] would constitute a second segment, and [C7,C8,C9,C10,C11,C12] would constitute a third segment. Thus, three codewords would be used to describe each frame. Although more codewords are used at each frame, the number of possible codewords drops significantly using split-Vector Quantization such that the size of the indices is greatly reduced.
However, even with the techniques provided by split-Vector Quantization, additional reductions in the amount of data transmitted or stored for a spectral representation of a speech signal is desired.