1. Field of the Invention
This present invention relates generally to the field of data-compression of digitally encoded speech signals and more particularly to a system and a method for compressing the serial bit stream signals that are generated by delta modulation encoders, such as CVSD (continuous variable slope modulation) encoders, and by sampled clipped speech encoders
2. Description of Related Art
A patent of interest for its teaching of analog to digital conversion of speech signals is U.S. Pat. No. 4,271,332 entitled "Speech Signal A/D Converter Using an Instantaneously-Variable Band Width Filter", by J. C. Anderson.
The signal technique described in this patent is denoted the MIMIC technique which will be referred to in the Description of The Preferred Embodiment
A patent of interest for its teaching of processing sampled clipped speech signals is U.S. Pat. No. 4,594,575 entitled "Improved Digital Processor for Speech Signals" by Avery et al. The type of processor described in this patent will be denoted SPFE, an acronym for Speech Processor Front-End. The acronym will be used in the Description of The Preferred Embodiment.
Machines that talk have been popular for many years, for they take on almost human characteristics in questioning and response.
Although many speech-synthesis products exists, their characteristics differ widely and are influenced by many factors such as speech-encoding methods, bit rate, and vocabulary preparation.
From the different sound and tonal inflections that accompany the spoken word, it is obvious that speech properties change during the transition from voiced to unvoiced speech. For example, there are large changes in peak amplitude and fundamental frequency, but because waveforms change little for short segments of speech, most speech-processing techniques, which intend to achieve a low bit rate, isolate such segments and process them as if they were short segments of sustained sound having fixed properties. This segment action which is usually referred to as framing, introduces a distortion in the reconstruction of speech. This distortion degrades the speech quality and intelligibility. For this reason and others, most of the low bit rate speech processors sound mechanical.
The need for deciding whether a given segment of a speech waveform should be classified as voiced speech, unvoiced speech, or silence (absence of speech) arises in many speech analysis systems. Most of the available methods work in conjunction with pitch analysis to decide what class the segment should fall into. There are two disadvantages in this approach to the voiced-unvoiced decision. First, the decision is based on a single feature--the degree of voice periodicity. Voiced speech is only approximately periodic; sudden changes in articulation and the idiosyncrasies of vocal cord vibrations can produce speech waveforms which are not periodic. In such cases, a feature such as the amplitude of the largest cepstral peak will fail to distinguish voiced speech from unvoiced. Second, the voiced-unvoiced decision is tied to the pitch detection which may be acceptable for speech synthesis applications, but for the proposed application, the linking of the voiced- unvoiced decision to pitch detection can result in unnecessary complexity as well as in poorer performance, particularly at the boundaries between voiced and unvoiced speech.
Pitch (i.e., fundamental frequency F.sub.0 and fundamental period T.sub.0) occupies a key position in the acoustic speech signal. The prosodic information of an utterance is predominantly determined by this parameter. The ear is more sensitive to changes of fundamental frequency than to changes of other speech signal parameters by an order of magnitude. The quality of voiced speech is essential influenced by the quality and faultlessness of the pitch measurement. The importance of this parameter thus necessitates using a good and reliable measurement method. The prior art shows that pitch detection can roughly be divided into the following three broad categories:
(1) A group which utilizes principally the time-domain properties of speech signals; PA0 (2) A group which utilizes principally the frequency-domain properties of speech signals; and PA0 (3) A group which utilizes both the time and frequency domain properties of speech signals. PA0 (1) The number of silent patterns (0000, 0101 or 1010) of SPFE, MIMIC, and CVSD respectively "X0". PA0 (2 ) The number of one pulses in a-frame "S1". PA0 (3) The number of one bits in a frame "S". PA0 (4) The width of the average one pulse "S/S1".