I. Field
The present invention generally relates to signal processing, and more particularly, to encoding and decoding of signals for storage and retrieval or for communications.
II. Background
In digital telecommunications, signals need to be coded for transmission and decoded for reception. Coding of signals concerns with converting the original signals into a format suitable for propagation over the transmission medium. The objective is to preserve the quality of the original signals but at a low consumption of the medium's bandwidth. Decoding of signals involves the reverse of the coding process.
A known coding scheme uses the technique of pulse-code modulation (PCM). Referring to FIG. 1 which shows a time-varying signal x(t) that can be a segment of a speech signal, for instance. The y-axis and the x-axis represent the amplitude and time, respectively. The analog signal x(t) is sampled by a plurality of pulses 20. Each pulse 20 has an amplitude representing the signal x(t) at a particular time. The amplitude of each of the pulses 20 can thereafter be coded in a digital value for later transmission, for example.
To conserve bandwidth, the digital values of the PCM pulses 20 can be compressed using a logarithmic companding process prior to transmission. At the receiving end, the receiver merely performs the reverse of the coding process mentioned above to recover an approximate version of the original time-varying signal x(t). Apparatuses employing the aforementioned scheme are commonly called the a-law or μ-law codecs.
As the number of users increases, there is a further practical need for bandwidth conservation. For instance, in a wireless communication system, a multiplicity of users can be sharing a finite frequency spectrum. Each user is normally allocated a limited bandwidth among other users.
In the past decade or so, considerable progress has been made in the development of speech coders. A commonly adopted technique employs the method of code excited linear prediction (CELP). Details of CELP methodology can be found in publications, entitled “Digital Processing of Speech Signals,” by Rabiner and Schafer, Prentice Hall, ISBN: 0132136031, September 1978; and entitled “Discrete-Time Processing of Speech Signals,” by Deller, Proakis and Hansen, Wiley-IEEE Press, ISBN: 0780353862, September 1999. The basic principles underlying the CELP method is briefly described below.
Reference is now returned to FIG. 1. Using the CELP method, instead of digitally coding and transmitting each PCM sample 20 individually, the PCM samples 20 are coded and transmitted in groups. For instance, the PCM pulses 20 of the time-varying signal x(t) in FIG. 1 are first partitioned into a plurality of frames 22. Each frame 22 is of a fixed time duration, for instance 20 ms. The PCM samples 20 within each frame 22 is collectively coded via the CELP scheme and thereafter transmitted. Exemplary frames of the sampled pulses are PCM pulse groups 22A-22C shown in FIG. 1.
For simplicity, take only the three PCM pulse groups 22A-22C for illustration. During encoding prior to transmission, the digital values of the PCM pulse groups 22A-22C are consecutively fed to a linear predictor (LP) module. The resultant output is a set of frequency values, also called a “LP filter” or simply “filter” which basically represents the spectral content of the pulse groups 22A-22C. The LP filter is then quantized.
The LP module generates an approximation of the spectral representation of the PCM pulse groups 22A-22C. As such, during the predicting process, errors or residual values are introduced. The residual values are mapped to a codebook which carries entries of various combinations available for close matching of the coded digital values of the PCM pulse groups 22A-22C. The best fitted values in the codebook are mapped. The mapped values are the values to be transmitted. The overall process is called time-domain linear prediction (TDLP).
Thus, using the CELP method in telecommunications, the encoder (not shown) merely has to generate the LP filters and the mapped codebook values. The transmitter needs only to transmit the LP filters and the mapped codebook values, instead of the individually coded PCM pulse values as in the a- and μ-law encoders mentioned above. Consequently, substantial amount of communication channel bandwidth can be saved.
On the receiver end, it also has a codebook similar to that in the transmitter. The decoder (not shown) in the receiver, relying on the same codebook, merely has to reverse the encoding process as aforementioned. Along with the received LP filters, the time-varying signal x(t) can be recovered.
Heretofore, many of the known speech coding schemes, such as the CELP scheme mentioned above, are based on the assumption that the signals being coded are short-time stationary. That is, the schemes are based on the premise that frequency contents of the coded frames are stationary and can be approximated by simple (all-pole) filters and some input representation in exciting the filters. The various TDLP algorithms in arriving at the codebooks as mentioned above are based on such a model. Nevertheless, voice patterns among individuals can be very different. Non-human audio signals, such as sounds emanated from various musical instruments, are also distinguishably different from that of the human counterparts. Furthermore, in the CELP process as described above, to expedite real-time signal processing, a short time frame is normally chosen. More specifically, as shown in FIG. 1, to reduce algorithmic delays in the mapping of the values of the PCM pulse groups, such as 22A-22C, to the corresponding entries of vectors in the codebook, a short time window 22 is defined, for example 20 ms as shown in FIG. 1. However, derived spectral or formant information from each frame is mostly common and can be shared among other frames. Consequently, the formant information is more or less repetitively sent through the communication channels, in a manner not in the best interest for bandwidth conservation.
Accordingly, there is a need to provide a coding and decoding scheme with improved preservation of signal quality, applicable not only to human speeches but also to a variety of other sounds, and further for efficient utilization of channel resources.