The invention refers to an encoding apparatus for processing an input signal and to a decoding apparatus for processing an encoded signal. The invention also refers to corresponding methods and to a computer program.
A central part of speech and audio codecs are their perceptual models, which describe the relative perceptual importance of errors in different elements of the signal representation. In practice, the perceptual models consist of signal-dependent weighting factors which are used in quantization of each element. For optimal performance, it would be desirable to use the same perceptual model at the decoder. While the perceptual model is signal-dependent, however, it is not known in advance at the decoder, whereby audio codecs generally transmit this model explicitly, at the cost of increased bit-consumption.
The era of Internet of Things (IoT) is approaching, whereby the next generation of speech and audio coders should embrace it. The design goals of IoT-systems however fit poorly with the classic design of speech and audio coders, whereby a larger redesign of the coders is necessitated.
Primarily, whereas state-of-the-art speech and audio coder such as AMR-WB, EVS, USAC and AAC consist of intelligent and complex encoders and relatively simple decoders [1-4], since IoT should support distributed low-complexity sensor-nodes, encoders are advantageously to be simple.
Secondly, since sensor-nodes are encoding the same source signal, application of the same quantization at each sensor-node would represent over-coding and potentially a serious loss in efficiency. Especially, since the perceptual model should be more or less the same at every node, transmitting it from every node is almost pure over-coding.
Conventional speech and audio coding methods consist of three parts:
1. a perceptual model which specifies the relative impact of errors in different parameters of the codec,
2. a source model which describes the range and likelihood of different inputs and
3. an entropy coder which utilizes the source model to minimize perceptual distortion [5].
Further, the perceptual model can be applied in either of two ways:                1. Signal parameters can be weighted according to the perceptual model, such that all parameters can then be quantized with the same accuracy. The perceptual model are then transmitted to the decoder such that the weighting can be undone.        2. The perceptual model can alternatively be applied as an evaluation model, such that the synthesis output of different quantizations are compared, weighted by the perceptual model, in an analysis-by-synthesis iteration. Though here the perceptual model has not to be transmitted, this approach has the disadvantage that quantization cells shapes are not regularly shaped which reduces coding efficiency. More importantly, however, to find the optimal quantization, a computationally complex brute-force search of different quantizations has to be used.        
Since the analysis-by-synthesis approach thus leads to a computationally complex encoder, it is not a viable alternative for IoT. Therefore, the decoder needs access to the perceptual model. However, as noted above, explicit transmission of the perceptual model (or equivalently, an envelope model of the signal spectrum), is not desirable because it lowers coding efficiency.
The object of the invention is to present a way to recover the perceptual model at the decoder from the transmitted signal without side-information concerning the perceptual model.