This application is the national phase under 35 U.S.C. xc2xa7371 of prior PCT International Application No. PCT/IB98/00893 which has an International filing date of Jun. 9, 1998 which designated the United States of America.
In source coding systems, digital data is compressed before transmission or storage to reduce the required bitrate or storing capacity. The present invention relates to a new method and apparatus for the improvement of source coding systems by means of Spectral Band Replication (SBR). Substantial bitrate reduction is achieved while maintaining the same perceptual quality or conversely, an improvement in perceptual quality is achieved at a given bitrate. This is accomplished by means of spectral bandwidth reduction at the encoder side and subsequent spectral band replication at the decoder, whereby the invention exploits new concepts of signal redundancy in the spectral domain.
Audio source coding techniques can be divided into two classes: natural audio coding and speech coding. Natural audio coding is commonly used for music or arbitrary signals at medium bitrates, and generally offers wide audio bandwidth. Speech coders are basically limited to speech reproduction but can on the other hand be used at very low bitrates, albeit with low audio bandwidth. Wideband speech offers a major subjective quality improvement over narrow band speech. Increasing the bandwidth not only improves intelligibility and naturalness of speech, but also facilitates speaker recognition. Wideband speech coding is thus an important issue in next generation telephone systems. Further, due to the tremendous growth of the multimedia field, transmission of music and other non-speech signals at high quality over telephone systems is a desirable feature.
A high-fidelity linear PCM signal is very inefficient in terms of bitrate versus the perceptual entropy. The CD standard dictates 44.1 kHz sampling frequency, 16 bits per sample resolution and stereo. This equals a bitrate of 1411 kbit/s. To drastically reduce the bitrate, source coding can be performed using split-band perceptual audio codecs. These natural audio codecs exploit perceptual irrelevancy and statistical redundancy in the signal. Using the best codec technology, approximately 90% data reduction can be achieved for a standard CD-format signal with practically no perceptible degradation. Very high sound quality in stereo is thus possible at around 96 kbit/s, i.e. a compression factor of approximately 15:1. Some perceptual codecs offer even higher compression ratios. To achieve this, it is common to reduce the sample-rate and thus the audio bandwidth. It is also common to decrease the number of quantization levels, allowing occasionally audible quantization distortion, and to employ degradation of the stereo field, through intensity coding. Excessive use of such methods results in annoying perceptual degradation Current codec technology is near saturation and further progress in coding gain is not expected. In order to improve the coding performance further, a new approach is necessary.
The human voice and most musical instruments generate quasistationary signals that emerge from oscillating systems. According to Fourier theory, any periodic signal may be expressed as a sum of sinusoids with the frequencies f, 2f, 3f, 4f, 5f etc. where f is the fundamental frequency. The frequencies form a harmonic series. A bandwidth limitation of such a signal is equivalent to a truncation of the harmonic series. Such a truncation alters the perceived timbre, tone colour, of a musical instrument or voice, and yields an audio signal that will sound xe2x80x9cmuffledxe2x80x9d or xe2x80x9cdullxe2x80x9d, and intelligibility may be reduced. The high frequencies are thus important for the subjective impression of sound quality.
Prior art methods are mainly intended for improvement of speech codec performance and in particular intended for High Frequency Regeneration (HFR), an issue in speech coding. Such methods employ broadband linear frequency shifts, non-linearities or aliasing [U.S. Pat. No. 5,127,054] generating intermodulation products or other non-harmonic frequency components which cause severe dissonance when applied to music signals. Such dissonance is referred to in the speech coding literature as xe2x80x9charshxe2x80x9d and xe2x80x9croughxe2x80x9d sounding. Other synthetic speech HFR methods generate sinusoidal harmonics that are based on fundamental pitch estimation and are thus limited to tonal stationary sounds [U.S. Pat. No. 4,771,465]. Such prior art methods, although useful for low-quality speech applications, do not work for high quality speech or music signals. A few methods attempt to improve the performance of high quality audio source codecs. One uses synthetic noise signals generated at the decoder to substitute noise-like signals in speech or music previously discarded by the encoder [xe2x80x9cImproving Audio Codecs by Noise Substitutionxe2x80x9d D. Schultz, JAES, Vol. 44, No. 7/8, 1996]. This is performed within an otherwise normally transmitted highband at an intermittent basis when noise signals are present. Another method recreates some missing highband harmonics that were lost in the coding process [xe2x80x9cAudio Spectral Coderxe2x80x9d A. J. S. Ferreira, AES Preprint 4201, 100th Convention, May 11-14, 1996, Copenhagen] and is again dependent on tonal signals and pitch detection. Both methods operate at a low duty-cycle basis offering comparatively limited coding or performance gain.
The present invention provides a new method and an apparatus for substantial improvements of digital source coding systems and more specifically for the improvements of audio codecs. The objective includes bitrate reduction or improved perceptual quality or a combination thereof. The invention is based on new methods exploiting harmonic redundancy, offering the possibility to discard passbands of a signal prior to transmisson or storage. No perceptual degradation is perceived if the decoder performs high quality spectral replication according to the invention. The discarded bits represent the coding gain at a fixed perceptual quality. Alternatively, more bits can be allocated for encoding of the lowband information at a fixed bitrate, thereby achieving a higher perceptual quality.
The present invention postulates that a truncated harmonic series can be extended based on the direct relation between lowband and highband spectral components. This extended series resembles the original in a perceptual sense if certain rules are followed: First, the extrapolated spectral components must be harmonically related to the truncated harmonic series, in order to avoid dissonance-related artifacts. The present invention uses transposition as a means for the spectral replication process, which ensures that this criterion is met. It is however not necessary that the lowband spectral components form a harmonic series for successful operation, since new replicated components, harmonically related to those of the lowband, will not alter the noise-like or transient nature of the signal. A transposition is defined as a transfer of partials from one position to another on the musical scale while maintaining the frequency ratios of the partials. Second, the spectral envelope, i.e. the coarse spectral distribution, of the replicated highband, must reasonably well resemble that of the original signal. The present invention offers two modes of operation, SBR-1 and SBR-2, that differ in the way the spectral envelope is adjusted.
SBR-1, intended for the improvement of intermediate quality codec applications, is a single-ended process which relies exclusively on the information contained in a received lowband or lowpass signal at the decoder. The spectral envelope of this signal is determined and extrapolated, for instance using polynomials together with a set of rules or a codebook. This information is used to continuously adjust and equalise the replicated highband. The present SBR-1 method offers the advantage of post-processing, i.e. no modifications are needed at the encoder side. A broadcaster will gain in channel utilisation or will be able to offer improved perceptual quality or a combination of both. Existing bitstream syntax and standards can be used without modification.
SBR-2, intended for the improvement of high quality codec applications, is a double-ended process where, in addition to the transmitted lowband signal according to SBR-1, the spectral envelope of the highband is encoded and transmitted. Since the variations of the spectral envelope has a much lower rate than the highband signal components, only a limited amount of information needs to be trasmitted in order to successfully represent the spectral envelope. SBR-2 can be used to improve the performance of current codec technologies with no or minor modifications of existing syntax or protocols, and as a valuable tool for future codec development
Both SBR-1 and SBR-2 can be used to replicate smaller passbands of the lowband when such bands are shut down by the encoder as stipulated by the psychoacoustic model under bit-starved conditions. This results in improvement of the perceptual quality by spectral replication within the lowband in addition to spectral replication outside the lowband. Further, SBR-1 and SBR-2 can also be used in codecs employing bitrate scalability, where the perceptual quality of the signal at the receiver varies depending on transmission channel conditions. This usually implies annoying variations of the audio bandwidth at the receiver. Under such conditions, the SBR methods can be used successfully in order to maintain a constantly high bandwidth, again improving the perceptual quality.
The present invention operates on a continuous basis, replicating any type of signal contents, ie. tonal or non-tonal (noise-like and transient signals). In addition, the present spectral replication method creates a perceptually accurate replica of the discarded bands from available frequency bands at the decoder. Hence, the SBR method offers a substantially higher level of coding gain or perceptual quality improvement compared to prior art methods. The invention can be combined with such prior art codec improvement methods; however, no performance gain is expected due to such combinations.
The SBR-method comprises the following steps:
encoding of a signal derived from an original signal, where frequency bands of the signal are discarded and the discarding is performed prior to or during encoding, forming a first signal,
during or after decoding of the first signal, transposing frequency bands of the first signal, forming a second signal,
performing spectral envelope adjustment, and
combining the decoded signal and the second signal, forming an output signal.
The passbands of the second signal may be set not to overlap or partly overlap the passbands of the first signal, and may be set in dependence of the temporal characteristics of the original signal and/or the first signal, or transmission channel conditions. The spectral envelope adjustment is performed based on estimation of the original spectral envelope from said first signal or on transmitted envelope information of the original signal.
The present invention includes to basic types of transposers: multiband transposers and time-variant pattern search prediction transposers, having different properties. A basic multiband transposition may be performed according to the present invention by the following:
filtering the signal to be transposed through a set of Nxe2x89xa72 bandpass fillers with passbands comprising the frequencies [f1, . . . ,fN] restively, forming N bandpass signals,
shifting the bandpass signals in frequency to regions comprising the frequencies M [f1, . . . , fN] where Mxe2x89xa01 is the transposition factor, and
combining the shifted bandpass signals, forming the transposed signal.
Alternatively, this basic multiband transposition may be performed according to the invention by the following:
bandpass filtering the signal to be transposed signal using an analysis filterbank or transform of such a nature that real- or complex-valued subband signals of lowpass type are generated,
an arbitrary number of channels k of said analysis filterbank or transform are connected to channels Mk, Mxe2x89xa01, in a synthesis filterbank or transform, and
the transposed signal is formed using the synthesis filterbank or transform.
An improved multiband transposition according to the invention incorporates phase adjustments, enhancing the performance of the basic multiband transposition.
The time-variant pattern search prediction transposition according to the present invention may be performed by the following:
performing transient detection on the first signal,
determining which segment of the first signal to be used when duplicating/discarding parts of the first signal depending on the outcome of the transient detection,
adjusting statevector and codebook properties depending on the outcome of the transient detection, and
searching for synchronisation points in chosen segment of the fist signal, based on the synchronisation point found in the previous synchronisation point search.
The SBR methods and apparatuses according to the present invention offer the following features:
1. The methods and apparatuses exploit new concepts of signal redundancy in the spectral domain.
2. The methods and apparatuses are applicable on arbitrary signals.
3. Each harmonic set is individually created and controlled.
4. All replicated harmonics are generated in such a manner as to form a continuation of the existing harmonic series.
5. The spectral replication process is based on transposition and creates no or imperceptible artifacts.
6. The spectral replication can cover multiple smaller bands and/or a wide frequency range.
7. In the SBR-1 method, the processing is performed at the decoder side only, i.e. all standards and protocols can be used without modification.
8. The SBR-2 method can be implemented in accordance with most standards and protocols with no or minor modifications.
9. The SBR-2 method offers the codec designer a new powerful compression tool.
10. The coding gain is significant.
The most attractive application relates to the improvement of various types of low bitrate codecs, such as MPEG 1/2 Layer I/II/III [U.S. Pat. No. 5,040,217], MPEG 2/4 AAC, Dolby AC-2/3, NTT Twin VQ [U.S. Pat. No. 5,684,920], ATandT/Lucent PAC etc. The invention is also useful in high quality speech codecs such as wide-band CELP and SB-ADPCM G.722 etc. to improve perceived quality. The above codecs are widely used in multimedia, in the telephone industry, on the Internet as well as in professional applications. T-DAB (Terrestial Digital Audio Broadcasting) systems use low bitrate protocols that will gain in channel utilisation by using the present method, or improve quality in FM and AM DAB. Satellite S-DAB will gain considerably, due to the excessive system costs involved, by using the present method to increase the number of programme channels in the DAB multiplex. Furthermore, for the first time, full bandwidth audio real-time string over the Internet is achievable using low bitrate telephone modems.