The present invention relates to a transform coder for speech and audio signals which is useful for rates down to and below 1 bit/sample. In particular it relates to using perceptually-based bit allocation in order to vector quantize the frequency-domain representation of the input signal. The present invention uses a masking threshold to define the distortion measure which is used to both train codebooks and select the best codewords and coefficients to represent the input signal.
There is a need for bandwidth efficient coding of a variety of sounds such as speech, music, and speech with background noise. Such signals need to be efficiently represented (good quality at low bit rates) for transmission over wireless (e.g. cell phone) or wireline (e.g. telephony or Internet) networks. Traditional coders, such as code excited linear prediction or CELP, designed specifically for speech signals, achieve compression by utilizing models of speech production based on the human vocal tract. However, these traditional coders are not as effective when the signal to be coded is not human speech but some other signal such as background noise or music. These other signals do not have the same typical patterns of harmonics and resonant frequencies and the same set of characterizing features as human speech. As well, production of sound from these other signals cannot be modelled on mathematical models of the human vocal tract. As a result, traditional coders such as CELP coders often have uneven and even annoying results for non-speech signals. For example, for many traditional coders music-on-hold is coded with annoying artifacts.
An object of the present invention is to provide a transform coder for speech and audio signals for rates down to near 1 bit/sample.
In accordance with an aspect of the present invention there is provided a method of transmitting a discretly represented frequency signal within a frequency band, said signal discretely represented by coefficients at certain frequencies within said band, comprising the steps of: (a) providing a codebook of codevectors for said band, each codevector having an element for each of said certain frequencies; (b) obtaining a masking threshold for said frequency signal; (c) for each one of a plurality of codevectors in said codebook, obtaining a distortion measure by the steps of: for each of said coefficients of said frequency signal (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold to obtain an indicator measure; summing those obtained indicator measures which are positive to obtain said distortion measure; (d) selecting a codevector having a smallest distortion measure; (e) transmitting an index to said selected codevector.
In accordance with another aspect of the present invention there is provided a method method of transmitting a discretely represented frequency signal, said signal discretely represented by coefficients at certain frequencies, comprising the steps of: (a) grouping said coefficients into frequency bands; (b) for each band: providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band; obtaining a representation of energy of coefficients in said each band; selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy; selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an index to said selected codevector; (d) concatenating said selected codevector addresses; and (e) transmitting said concatenated codevector addresses and an indication of each said representation of energy.
In accordance with a further aspect of the invention, there is provided a method of receiving a discretly represented frequency signal, said signal discretely represented by coefficients at certain frequencies, comprising the steps of: providing pre-defined frequency bands; for each band providing a codebook of codevectors, each codevector having an element corresponding with each of said certain frequencies which are within said each band; receiving concatenated codevector addresses for said bands and a per band indication of a representation of energy of coefficients in each band; determining a length of address for each band based on said per band indication of a representation of energy; parsing said concatenated codevector addresses based on said address length determining step; addressing said codebook for each band with a parsed codebook address to obtain frequency coefficients for each said band.
A transmitter and a receiver operating in accordance with these methods are also provided.
In accordance with a further aspect of the present invention there is provided a method of obtaining a codebook of codevectors which span a frequency band discretely represented at pre-defined frequencies, comprising the steps of: receiving training vectors for said frequency band; receiving an initial set of estimated codevectors; associating each training vector with a one of said estimated codevectors with respect to which it generates a smallest distortion measure to obtain associated groups of vectors; partitioning said associated groups of vectors into Voronoi regions; determining a centroid for each Voronoi region; selecting each centroid vector as a new estimated codevector; repeating from said associating step until a difference between new estimated codevectors and estimated codevectors from a previous iteration is less than a pre-defined threshold; and populating said codebook with said estimated codevectors resulting after a last iteration.
According to yet a further aspect of the invention, there is provided a method of generating an embedded codebook for a frequency band discretely represented at pre-defined frequencies, comprising the steps of: (a) obtaining an optimized larger first codebook of codevectors which span said frequency band; (b) obtaining an optimized smaller second codebook of codevectors which span said frequency band; (c) finding codevectors in said first codebook which best approximate each entry in said second codebook; (d) sorting said first codebook to place said codevectors found in step (c) at a front of said first codebook.
An advantage of the present invention is that it provides a high quality method and apparatus to code and decode non-speech signals, such as music, while retaining high quality for speech.