Schemes where the original audio information above a certain frequency is replaced by gaussian noise or manipulated lowband information are collectively referred to as High Frequency Reconstruction (HFR) methods. Prior-art HFR methods are, apart from noise insertion or non-linearities such as rectification, generally utilizing so-called copy-up techniques for generation of the highband signal. These techniques mainly employ broadband linear frequency shifts, i.e. translations, or frequency inverted linear shifts, i.e. foldings. The prior-art HFR methods have primarily been intended for the improvement of speech codec performance. Recent developments in highband regeneration using perceptually accurate methods, have however made HFR methods successfully applicable also to natural audio codecs, coding music or other complex programme material, PCT patent [WO 98/57436]. Under certain conditions, simple copy-up techniques have shown to be adequate when coding complex programme material as well. These techniques have shown to produce reasonable results for intermediate quality applications and in particular for codec implementations where there are severe constraints for the computational complexity of the overall system.
The human voice and most musical instruments generate quasistationary tonal signals that emerge from oscillating systems. According to Fourier theory, any periodic signal may be expressed as a sum of sinusoids with frequencies f, 2 f, 3 f, 4 f, 5 f etc. where f is the fundamental frequency. The frequencies form a harmonic series. Tonal affinity refers to the relations between the perceived tones or harmonics. In natural sound reproduction such tonal affinity is controlled and given by the different type of voice or instrument used. The general idea with HFR techniques is to replace the original high frequency information with information created from the available lowband and subsequently apply spectral envelope adjustment to this information. Prior-art HFR methods create highband signals where tonal affinity often is uncontrolled and impaired. The methods generate non-harmonic frequency components which cause perceptual artifacts when applied to complex programme material. Such artifacts are referred to in the coding literature as “rough” sounding and are perceived by the listener as distortion.
Sensory dissonance (roughness), as opposed to consonance (pleasantness), appears when nearby tones or partials interfere. Dissonance theory has been explained by different researchers, amongst others Plomp and Levelt [“Tonal Consonance and Critical Bandwidth” R. Plomp, W. J. M. Levelt JASA, Vol 38, 1965], and states that two partials are considered dissonant if the frequency difference is within approximately 5 to 50% of the bandwidth of the critical band in which the partials are situated. The scale used for mapping frequency to critical bands is called the Bark scale. One bark is equivalent to a frequency distance of one critical band. For reference, the function
                              z          ⁡                      (            f            )                          =                              26.81                          1              +                              1960                f                                              -                      0.53            ⁢                                                  [            Bark            ]                                              (        1        )            can be used to convert from frequency (f) to the bark scale (z). Plomp states that the human auditory system can not discriminate two partials if they differ in frequency by approximately less than five percent of the critical band in which they are situated, or equivalently, are separated less than 0,05 Bark in frequency. On the other hand, if the distance between the partials are more than approximately 0,5 Bark, they will be perceived as separate tones.
Dissonance theory partly explains why prior-art methods give unsatisfactory performance. A set of consonant partials translated upwards in frequency may become dissonant. Moreover, in the crossover regions between instances of translated bands and the lowband the partials can interfere, since they may not be within the limits of acceptable deviation according to the dissonance-rules.
WO 98/57436 discloses to perform frequency transposition by means of multiplication by a transposition factor M. Consecutive channels from an analysis filter bank are frequency-translated to synthesis filter bank channels, but which are spaced apart by two intermediate reconstruction range channels, when the multiplication factor M is 3, or which are spaced apart by one reconstruction range channel, when the multiplication factor M equals two. Alternatively, amplitude and phase information from different analyser channels can be combined. The amplitude signals are connected such that the magnitudes of consecutive channels of the analysis filterbank are frequency-translated to the magnitudes of subband signals associated with consecutive synthesis channels. The phases of the subband signals from the same channels are subjected to frequency-transposition using a factor M.
It is an object of the present invention to provide a concept for obtaining an envelope-adjusted and frequency-translated signal by high-frequency spectral reconstruction and a concept for decoding using high-frequency spectral reconstruction, that result in a better quality reconstruction.
This object is achieved by a method in accordance with claims 1 and 13 or 23 or an apparatus according to claims 19 and 20 or a decoder according to claim 21.