This invention relates generally to voice processors used in communication receivers, and more specifically to a digital signal processor which processes data including voice messaging data that may have both voiced and unvoiced speech components.
Communications systems, such as paging systems, have had to compromise the length of messages, number of users and convenience to the user in order to operate the systems profitably. The number of users and the length of the messages were limited to avoid over crowding of the channel and to avoid long transmission time delays. The user""s convenience is directly affected by the channel capacity, the number of users on the channel, system features and type of messaging. In a paging system, tone only pagers that simply alerted the user to call a predetermined telephone number offered the highest channel capacity but were some what inconvenient to the users. Conventional analog voice pagers allowed the user to receive a more detailed message, but drastically limited the number of users on a given channel. Analog voice pagers, being real time devices, also had the disadvantage of not providing the user with a way of storing and repeating the message received. The introduction of digital pagers with numeric and alphanumeric displays and memories overcame many of the problems associated with the older pagers. These digital pagers improved the message handling capacity of the paging channel, and provided the user with a way of storing messages for later review. with voice announcements. In an attempt to provide this service over a limited capacity digital channel, various digital voice compression techniques and synthesis techniques have been tried, each with their own level of success and limitation. Voice compression methods, based on vocoder techniques, currently offer a highly promising technique for voice compression. Of the low data rate vocoders, the multi band excitation (MBE) vocoder is among the most natural sounding vocoder.
The vocoder analyzes short segments of speech, called speech frames, and characterizes the speech in terms of several parameters that are digitized and encoded for transmission. The speech characteristics that are typically analyzed include voicing characteristics, pitch, frame energy, and spectral characteristics. Vocoder synthesizers use these parameters to reconstruct the original speech by mimicking the human voice mechanism. In other words, vocoder synthesizers mimick the original speech by creating a waveform having the same pitch, frame energy parameters, voicing characteristics, and spectrum shape as the original speech wave form to provide natural sounding synthetic speech. Vocoder synthesizers model the human voice as an excitation source, controlled by the pitch and frame energy parameters followed by a spectrum shaping controlled by the spectral parameters.
The voicing characteristic describes the repetitiveness of the speech waveform. Speech consists of periods where the speech waveform has a repetitive nature and periods where no repetitive characteristics can be detected. The periods where the waveform has a periodic repetitive characteristic are said to be voiced. Periods where the waveform seems to have a totally random characteristic are said to be unvoiced. The voiced/unvoiced characteristics are used by the vocoder speech synthesizer to determine the type of excitation signal which will be used to reproduce that segment of speech.
Pitch defines the fundamental frequency of the repetitive portion of the voiced wave form. Pitch is typically defined in terms of a pitch period or the time period of the repetitive segments of the voiced portion of the speech wave forms. The speech waveform is a highly complex waveform and very rich in harmonics. The complexity of the speech waveform makes it very difficult to extract pitch information. Changes in pitch frequency must also be smoothly tracked for an MBE vocoder synthesizer to smoothly reconstruct the original speech. The human auditory process is very sensitive to changes in pitch and the perceived quality of the reconstructed speech is strongly effected by the accuracy of the pitch derived.
Frame energy is a measure of the normalized average RMS power of the speech frame. This parameter defines the loudness of the speech during the speech frame.
The spectral characteristics define the relative amplitude of the harmonics and the fundamental pitch frequency during the voiced portions of speech and the relative spectral shape of the noise-like unvoiced speech segments. The data transmitted defines the spectral characteristics of the reconstructed speech signal.
The human voice, during periods that are classified as voiced, has portions of the spectrum that are unvoiced. MBE vocoders produce natural sounding speech because speech waveforms, during a voiced period, are mixtures of voiced and unvoiced frequency bands. The speech spectrum is divided into a number of frequency bands and a determination is made for each band as to the voiced/unvoiced nature of each band. In conventional MBE vocoders, the band voiced/unvoiced information can be transmitted explicitly or can be determined at the synthesizer using tables which associate voicing patterns with spectral vectors, or other means. Transmission of explicit band voicing data increases the quantity of data that must be transmitted. The use of tables by the synthesizer occupies a substantial quantity of memory and can be less accurate than explicit data. Other methods may require excessive computational complexity and are inappropriate for low-power, inexpensive devices. Accordingly, what is needed for optimal utilization of a channel in a communication system, such as a paging channel in a paging system or a data channel in a non-real time one way or two way data communication""s system is an MBE synthesizer that accurately reproduces voice from compressed data, using methods which are well suited to implementation on low cost, low power devices, while maintaining acceptable speech quality.