My invention relates to speech processing and, more particularly, to compression of speech patterns.
It is generally accepted that a speech signal requires a bandwidth of at least 4 kHz for reasonable intelligibility. In digital speech processing systems such as speech synthesizers, recognizes, or coders, the channel capacity needed for transmission or memory required for storage of the digital elements of the full 4 kHz bandwidth waveform is very large. Many techniques have been devised to reduce the number of digital codes needed to represent a speech signal. Waveform coding such as PCM, DPCM, Delta Modulation or adaptive predictive coding result in natural sounding, high quality speech at bit rates between 16 and 64 kbps. The speech quality obtained from waveform coders, however, degrades as the bit rate is reduced below 16 kbps.
An alternative speech coding technique disclosed in U.S. Pat. No. 3,624,302 issued Nov. 30, 1971 to B. S. Atal and assigned to the same assignee utilizes a small number, e.g., 12-16, of slowly varying parameters which may be processed to produce a low distortion replica of a speech pattern. Such parameters, e.g., LPC or log area, generated by linear prediction analysis can be spectrum limited to 50 Hz without significant band limiting distortion. Encoding of the LPC or log area parameters generally requires sampling at a rate of twice the bandwidth and quantizing each resulting frame of LPC or log area parameters. Each frame of a log area parameter, for example, can be quantized using 48 bits. Consequently, 12 log area parameters each having a 50 Hz bandwidth results in a total bit rate of 4800 bits/sec.
Further reduction of bandwidth decreases the bit rate but the resulting increase in distortion, interferes with the intelligibility of speech synthesized from the lower bandwidth parameters. It is well known that sounds in speech patterns do not occur at a uniform rate and techniques have been devised to take into account such nonuniform occurrences. U.S. Pat. No. 4,349,700 issued Sept. 14, 1982 to L. R. Rabiner et al and assigned to the same assignee discloses arrangements that permit recognition of speech patterns having diverse sound patterns utilizing dynamic programming. U.S. Pat. No. 4,038,503 issued July 26, 1977 to Moshier discloses a technique for nonlinear warping of time intervals of speech patterns so that the sound features are represented in a more uniform manner. These arrangements, however, require storing and processing acoustic feature signals that are sampled at a rate corresponding to most rapidly changing feature in the pattern. It is an object of the invention to provide an improved speech representation arrangement having reduced digital storage and processing requirements.