The present invention relates to the field of audio processing and audio coding, in particular to encoding and decoding of pulse positions of tracks in an audio signal.
Audio processing and/or coding has advanced in many ways. In audio coding, linear predictive coders play an important role. When encoding an audio signal, e.g. an audio signal comprising speech, linear predictive encoders usually encode a representation of the spectral envelope of the audio signal. To this end, linear predictive encoders may determine predictive filter coefficients to represent the spectral envelope of sound in encoded form. The filter coefficients may then be used by a linear predictive decoder to decode the encoded audio signal by generating a synthesized audio signal using the predictive filter coefficients.
Important examples for linear predictive coders are ACELP coders (ACELP=Algebraic Code-Exited Linear Prediction coders). ACELP coders are widely used, for example, in USAC (USAC=Unified Speech and Audio Coding) and may have further application fields, for example in LD-USAC (Low Delay Unified Speech and Audio Coding).
ACELP encoders usually encode an audio signal by determining predictive filter coefficients. To achieve better encoding, ACELP encoders determine a residual signal, also referred to as target signal, based on the audio signal to be encoded, and based on the already determined predictive filter coefficients. The residual signal may, for example, be a difference signal representing a difference between the audio signal to be encoded and the signal portions that are encoded by the predictive filter coefficients, and, possibly, by adaptive filter coefficients resulting from a pitch analysis. The ACELP encoder then aims to encode the residual signal. For this, the encoder encodes algebraic codebook parameters, which are used to encode the residual signal.
To encode the residual signal, algebraic codebooks are used. Usually, algebraic codebooks comprise a plurality of tracks, for example, four tracks each comprising 16 track positions. In such a configuration, a total of 4·16=64 sample positions can be represented by a respective algebraic codebook, for example, corresponding to the number of samples of a subframe of the audio signal to be encoded.
The tracks of the codebook may be interleaved such that track 0 of the codebook may represent samples 0, 4, 8, . . . , 60 of the subframe, such that track 1 of the codebook may represent samples 1, 5, 9, . . . , 61 of the subframe, such that track 2 of the codebook may represent samples 2, 6, 10, . . . , 62 of the subframe, and such that track 3 of the codebook may represent samples 3, 7, 11, . . . , 63 of the subframe. Each track may have a fixed number of pulses. Or, the number of pulses per track may vary, e.g. depending on other conditions. A pulse may, for example, be positive or negative, e.g. may be represented by +1 (positive pulse) or 0 (negative pulse).
For encoding the residual signal, when encoding, a codebook configuration may be chosen, that best represents the remaining signal portions of the residual signal. For this, the available pulses may be positioned at suitable track positions that reflect best the signal portions to be encoded. Moreover, it may be specified, whether a corresponding pulse is positive or negative.
On a decoder side, an ACELP decoder would at first decode the algebraic codebook parameters. The ACELP decoder may also decode the adaptive codebook parameters. To determine the algebraic codebook parameters, the ACELP decoder may determine the plurality of pulse positions for each track of an algebraic codebook. Moreover, the ACELP decoder may also decode, whether a pulse at a track position is a positive or a negative pulse. Furthermore, the ACELP decoder may also decode the adaptive codebook parameters. Based on this information, the ACELP decoder usually generates an excitation signal. The ACELP decoder then applies the predictive filter coefficients on the excitation signal to generate a synthesized audio signal to obtain the decoded audio signal.
In ACELP, pulses on a track are generally encoded as follows. If the track is of length 16 and if the number of pulses on this track is one, then we can encode the pulse position by its position (4 bits) and sign (1 bit), totaling 5 bits. If the track is of length 16 and the number of pulses is two, then the first pulse is encoded by its position (4 bits) and sign (1 bit). For the second pulse we need to encode the position only (4 bits), since we can choose that the sign of the second pulse is positive if it is to the left of the first pulse, negative if it is to the right of the first pulse and the same sign as the first pulse if it is at the same position as the first pulse. In total, we therefore need 9 bits to encode 2 pulses. In comparison to encoding the pulse positions separately, by 5 bits each, we thus save 1 bit for every pair of pulses.
Encoding a larger number of pulses than 2, we can encode pulses pair-wise and if the number of pulses is odd, encode the last pulse separately. Then, for example, for a track of 5 pulses, we would need 9+9+5=23 bits. If we have 4 tracks, then 4×23=92 bits would be necessitated for encoding a subframe of length 64 with 4 tracks and 5 pulses per track. However, it would be very appreciated, if the number of bits could furthermore be reduced.
It would be very appreciated, if an apparatus for encoding and a respective apparatus for decoding with improved encoding or decoding concepts would be provided, which have means to encode or decode pulse information in an improved way using fewer bits for pulse information representation, as this would, for example, reduce the transmission rate for transmitting a respectively encoded audio signal, and as furthermore, this would, for example, reduce the storage needed to store a respectively encoded audio signal.