1. Technical Field
The present invention relates in general to methods and systems for speech signal data manipulation and in particular to improved methods and systems for compressing digital data representations of human speech utterances. Still more particularly, the present invention relates to a method and system for compressing digital data representations of human speech utterances utilizing the repetitive nature of voiced sounds contained therein.
2. Description of the Related Art
Modern communications and information networks often require the use of digital speech, digital audio and digital video. Transmission, storage, conferencing and many other types of signal processing for information, manipulation and display utilize these types of data. Basic to all such applications of traditionally analog signals are the techniques utilized to digitize those waveforms to achieve acceptable levels of signal quality for these applications.
A straightforward digitization of raw analog speech signals is, as those skilled in the art will appreciate, very inefficient. Raw speech data is typically sampled at anywhere from eight thousand samples per second to over forty-four thousand samples per second. Sixteen-to-eight bit companding and Adaptive Delta Pulse Code Modulation (ADPCM) may be utilized to achieve a 4:1 reduction in data size; however, even utilizing such a compression ratio the tremendous volume of data required to store speech signals makes voice-annotated mail, LAN-transmitted speech and personal computer based telephone answering and speaking software applications extremely cumbersome to utilize. For example, a one page letter containing two kilobytes of digital data might have attached thereto a voice message of fifteen seconds duration, which may occupy 160 kilobytes of data. Multimedia applications of recorded speech are similarly hindered by the size of the data required and are typically confined to high-density storage media, such as CD-ROM.
As a consequence of the large amounts of data required and the desirability of utilizing speech or digital audio within a data processing system numerous techniques have been proposed for compressing the digital data representation of speech signals. For example, International Business Machines Corporation Technical Disclosure Bulletin, July 1981, pages 1017-1018, discloses a technique whereby compression recording and expansion of asymmetrical speech waves may be accomplished. As described therein, the first cycle of each pitch period during a voiced sound period is utilized for compression and reconstruction of the speech. This technique is premised upon the observation that within most pitch periods the first one-fourth to one-fifth of the waveform is significantly larger in amplitude than subsequent portions of the waveform.
This first portion of the waveform is thought to contain nearly all of the frequency components that the remainder of the waveform contains and consequently only a fractional portion of the waveform is utilized for compression and reconstruction. When an unvoiced sound is encountered during a speech signal utilizing this technique one of two procedures are utilized. Either the unvoiced speech is digitized and stored in its entirety, or a single millisecond of sound along with the length of time that the unvoiced sound period lasts is encoded. During reconstruction the single sampled pitch period is replicated at decreasing levels of amplitude for a period of time equal to the voiced sound. While this technique represents an excellent data compression and reconstruction method it suffers from some loss of intelligibility.
Other techniques utilize high sampling rates to faithfully reproduce the random noise aspects of unvoiced speech; however, these techniques require substantial levels of data and do not take into account the essential qualities which determine speech intelligibility.
In view of the above, it should be apparent that a need exists for a method and system which may be utilized to efficiently compress speech and data and yet permit regeneration of that data without a substantial loss in speech intelligibility.