The present invention relates to an audio signal synthesizer for generating a synthesis audio signal, an audio signal encoder and a data stream comprising an encoded audio signal.
Natural audio coding and speech coding are two major classes of codecs for audio signals. Natural audio coders are commonly used for music or arbitrary signals at medium bit rates and generally offer wide audio bandwidths. Speech coders are basically limited to speech reproduction and may be used at very low bit rate. Wide band speech provides a major subjective quality improvement over narrow band speech. Increasing the bandwidth not only improves the naturalness of speech, but also the speaker's recognition and intelligibility. Wide band speech coding is thus an important issue in the next generation of telephone systems. Further, due to the tremendous growth of the multimedia field, transmission of music and other non-speech signals at high quality over telephone systems as well as storage and, for example, transmission for radio/TV or other broadcast systems is a desirable feature.
To drastically reduce the bit rate, source coding can be performed using split-band perceptual audio codecs. These natural audio codecs exploit perceptual irrelevancy and statistical redundancy in the signal. In case exploitation of the above alone is not sufficient with respect to the given bitrate constraints, the sample rate is reduced. It is also common to decrease the number of composition levels, allowing occasional audible quantization distortion, and to employ degradation of the stereo field through joint stereo coding or parametric coding of two or more channels. Excessive use of such methods results in annoying perceptual degradation. In order to improve the coding performance, bandwidth extension methods such as spectral band replication (SBR) are used as an efficient method to generate high frequency signals in an HFR (high frequency reconstruction) based codec.
In the process of replicating the high frequency signals, a certain transformation may, for example, be applied on the low frequency signals and the transformed signals are then inserted as high frequency signals. This process is also known as patching and different transformations may be used. The MPEG-4 Audio standard uses only one patching algorithm for all audio signals. Hence, it lacks the flexibility to adapt the patching on different signals or coding schemes.
On the one hand, the MPEG-4 standard provides a sophisticated processing of regenerated high-band, in which many important SBR parameters are applied. These important SBR parameters are the data on the spectral envelope, the data on the noise floor to be added to the regenerated spectral portion, information on the inverse filtering tool in order to adapt the tonality of the regenerated high-band to the tonality of the original high-band, and additional spectral band replication processing data such as data on missing harmonics etc. This well-established processing of the replicated spectrum which is provided by a patching of consecutive bandpass signals within the filterbank domain is proven to be efficient to provide high quality and to be implementable with reasonable resources regarding processing power, memory requirements, and power requirements.
On the other hand, patching takes place in the same filterbank as the further processing of the patched signal takes place, so that there is a strong link between the patching operation and the further processing of the result of the patching operation. Therefore, the implementation of different patching algorithms is problematic in this combined approach.
WO 98/57436 discloses transposition methods used in spectral band replication, which are combined with spectral envelope adjustment.
WO 02/052545 teaches that signals can be classified either in pulse-train-like or non-pulse-train-like and based on this classification an adaptive switched transposer is proposed. The switched transposer performs two patching algorithms in parallel and a mixing unit combines both patched signals dependent on the classification (pulse train or non pulse train). The actual switching between or mixing of the transposers is performed in an envelope-adjusting filterbank in response to envelope and control data. Furthermore, for pulse-train-like signals, the base band signal is transformed into a filterbank domain, a frequency translating operation is performed and an envelope adjustment of the result of the frequency translation is performed. This is a combined patching/further processing procedure. For non-pulse-train-like signals, a frequency domain transposer (FD transposer) is provided and the result of the frequency domain transposer is then transformed into the filterbank domain, in which the envelope adjustment is performed. Thus, implementation and flexibility of this procedure which has, in one alternative, a combined patching/further processing approach and which has, in the other alternative, a frequency domain transposer which is positioned outside of the filterbank in which the envelope adjustment takes place is problematic with respect to flexibility and implementation possibilities.