Numerous sub-band speech coders are known as are speech coder systems based on digital microprocessors for handling the manipulation of the digital energy level samples normally occurring in such systems. Reference may be had to the 1977 IEEE International Conference on Acoustics, Speech and Signal Processing record of May 9-11, 1977, pages 191-195 and to the IEEE Acoustics, Speech and Signals Processing Society Proceedings of April 9-11, 1980, Vol. 1, pages 332-335 which show typical digital sub-band speech coder and decoder arrangements for multi-channel speech transmission use.
In systems like those in the above-referenced publications, the subjective quality of performance of sub-band speech coders perceived by a listener at the receiver is highly dependent upon the allocation of available bits in the transmission medium to the individual frequency bands in the sub-band coder. A major improvement previously made involved the use of dynamic bit allocation where the available bits are dynamically distributed among the frequency bands according to the energy present in each band sample. This technique was extended to a variable bit rate system where many speech coders could share a common bit rate resource, i.e., a transmission channel, by assigning bits to all the bands of all the coders according to the energy in each frequency band in its relationship to all other frequency bands.
The typical sub-band speech coder takes the 0 to 4 kilohertz speech spectrum and samples it, typically at a sampling rate of 8,000 samples per second. Through filtering and sub-sampling, the speech spectrum is divided into sub-spectra, typically into eight sub-bands of 500 Hertz width each. In such a system, depicted schematically in FIG. 1, incoming analog signals on analog line 1 are converted to digital sample stream by the analog to digital converter 2, samples of which are clocked out by the clock 4 over line 3, typically at an 8 kilohertz sampling rate to a parallel filter bank 5.
The filter bank 5 divides the incoming digital stream into typically 8 frequency sub-bands spanning the spectrum from 0 to 4000 hertz. The output is thus a series of eight individual channels each having samples occurring at the rate of 1000 samples per second as schematically shown by the clock 6 controlling the output of the filter bank 5 over lines 7.
Individual frequency sub-band peaks and the overall peaks are measured by the peak quantizer 9 which normalizes the signal samples within a time frame. Forward error correction and dynamic bit allocation are applied to the quantized samples by forward error correction generator 10 and by the dynamic bit allocation technique or algorithm normally practiced in a microprocessor as shown by the dynamic bit allocation section 11. The output of the filter bank 5 is then companded or normalized in level by compander 8 and quantized to the number of bits allocated by 11.
The output from the compander is typically a signal stream of approximately 13,000 bps and the forward error correction generator 10 generates an output stream of approximately 3000 bps including the peak quantizer data which presents a total data stream to the serializer 13 for transmission over the digital channel 14 of approximately 16,000 bps. This serial signal stream includes the actual companded signal samples plus a side channel of information that indicates the bit allocations provided for each frequency sub-band plus the forward error correction code.
Referring to FIG. 1 as the prior art, the 0 to 4 kilohertz input spectrum on line 1 is typically sampled at 8000 samples per second after it emerges from the A to D converter 2. This is shown by the sample clock 4 controlling the output on line 3 from the analog to digital converter 2. Filtering and sub-sampling are conducted in the parallel filter bank 5 which decimates the incoming series of samples in the total spectrum into sub-spectra, typically 8. In the example given, the 0 to 4 kilohertz input spectrum is decimated into 8 sub-bands of 500 Hertz width each. The first band is the 0 to 0.5 kilohertz band, the second is the 0.5 to 1 kilohertz band, etc. Each of the sub-bands individual time waveforms are represented by a 1000 sample per second bit stream at the output of the filter bank 5 as controlled by the clock 6. Numerous other bandwidths are sometimes used and 16 bands of 250 Hertz width each are not unusual. Occasionally non equal sub-band widths are employed.
The eight individual sub-band time waveforms are normally processed in time block lengths ranging from 4 to 32 milliseconds in a signal processor typically embodied as a microprocessor. The illustration in FIG. 1 assumes a 16 millisecond sample block time length. The peak quantizer 9 in FIG. 1 finds the peak magnitude of the signal in each sub-band within a given time block or series of samples. The individual sub-band frequency peaks are logarithmically quantized, typically to a degree of 2 to 4 dB of resolution.
The information is then passed to a dynamic bit allocation means and to a forward error correction coder which adds error protection. The result is then passed to serializer 13 which multiplexes the actual bits from the transmission stream of samples coming from the parallel filter bank 5. The bit allocation has been assigned at a reduced level by the bit allocation technique practiced in box 11. The reduction in bits is to the level occurring in the compander 8 which also multiplexes in the side channel information which informs the receiver of the specific bit allocation employed during this sample block of 16 milliseconds.
In FIG. 1, the dynamic bit allocation function 11 assigns available bandwidth bits for a given block of 16 milliseconds of time to individual frequency sub-bands normally at the rate of 1 bit for every 6 dB of peak signal.
A given frequency sub-band with twice the peak value of a second sub-band would get one more bit than the second. A band with four times the peak energy would get two more bits than the other sub-bands and so forth. In practice, this ideal assignment cannot be achieved since a fixed number of available bandwidth bits cannot be subdivided precisely in this manner among all the available sub-bands. The actual process performs an initial bit assignment which includes possibly some very large numbers including negative numbers and fractional numbers. These are then rounded to integers and limited to a minimum of 0 and a maximum of, perhaps, 5 bits. This usually results in the wrong total of bits required for assignment so that an iterative redistribution of bits is required. All of this is a highly time and hardware consumptive process which provides less than ideal accuracy.
The sample compander and quantizer 8 uses the quantized peak energy information to compand or normalize the time waveform in each band. It then quantizes each sample in each frequency sub-band with the number of bits that are assigned by the bit allocation technique for that frequency sub-band. All of the information used for companding and for the bit allocation is made available to the receiver or demodulator at the far end of the system so that it can reconstruct the original time waveforms and pass them through reconstructive digital to analog filters to approximate the original 0 to 4 kilohertz input signal. The receiver end is not shown in FIG. 1 but may be seen clearly in the IEEE International Conference on Acoustic Speech and Signal Processing, Vol. 1 cited above.
In this process, some degradation in speech quality will take place since the available bits, i.e., the bandwidth assigned for this coder on the transmission system may not be sufficient to precisely reconstruct the input signal in its original form.
The problems associated with this type of system are primarily those of bit allocation. The bit allocation technique described above tries to approximate the required functions and to achieve an optimum signal to noise ratio for a given allowed fixed number of bits or bandwidth provided to the coder. The first problem is that signal to noise ratio will be highly affected in each speech spectrum due to the fixed number of bits per second assigned. Flat spectra will have very few bits assigned in all of their sub-bands while sparse spectra will have many bits assigned to a few of the higher energy sub-bands and will thus yield high signal to noise ratios as compared with the low signal to noise ratio in the flat spectra. Secondly, it has been observed that humans do not hear noise in a signal proportional to the signal to noise ratio. Additionally, not all humans hear the same and do not hear according to any known equations or mathematical models. The output quality of a speech coder is evaluated by experts on its subjective equivalent signal to noise ratio based on their appreciation of the perceived quality of speech as it is reconstructed. Usually their estimate of the overall signal to noise ratio is dramatically different from the actual quantitative signal to noise ratio that exists.
Some partial solutions have been offered to these aforementioned problems. Allocating bits at a rate of less than 1 bit per 6 dB of input peak signal has helped. Non-linear quantizers have also helped. Variable bit rate assignment techniques can help if one can determine how to vary the bit rate assignment. All of these attempts amount to a guess at how humans actually perceive the quality of sound through hearing. All of these techniques use some formula that is convenient or easily implemented as opposed to whatever is truly needed.