The present invention relates to the processing of audio or image signals and, in particular, to the encoding or decoding of audio or image signals in the presence of transients.
Contemporary frequency-domain speech/audio coding schemes based on overlapping FFTs or the modified discrete cosine transform (MDCT) offer some degree of adaptation to non-stationary signal characteristics. The general-purpose codecs standardized in MPEG, namely MPEG-1 Layer 3 better known as MP3, MPEG-4 (HE-)AAC [1], and most recently, MPEG-0 xHE-AAC (USAC), as well as the Opus/Celt codec specified by the IETF [2], allow the coding of a frame using one of at least two different transform lengths—one long transform of length M for stationary signal passages, or 8 short transforms of length M/8 each. In the case of the MPEG codecs, switching from long to short and from short to long transforms (also known as block switching) necessitates the use of asymmetrically windowed transition transforms, namely a start and a stop window, respectively. These transform shapes, along with other known prior-art shapes, are depicted in FIG. 16. It should be noted that the linear overlap slope is merely illustrative and varies in exact shape. Possible window shapes are given in the AAC standard [1] and in section 6 of [3].
Given that if the upcoming frame is to be coded with short transforms by an MPEG encoder, the current frame has to be coded with a start transition transform, it becomes evident that an encoder implemented according to one of the above-mentioned MPEG standards necessitates at least one frame length of look-ahead. In low-delay communication applications, however, it is desirable to minimize or even avoid this additional look-ahead. To this end, two modifications to the general-purpose coding paradigm have been proposed. One, which was adopted e.g. in Celt [2], is to reduce the overlap of the long transform to that of the short transform so that asymmetric transition windows can be avoided. The other modification, which is used e.g. in the MPEG-4 (Enhanced) Low Delay AAC coding schemes, is to disallow switching to shorter transforms and instead rely on a Temporal Noise Shaping (TNS) coding tool [4] operating on the long-transform coefficients to minimize temporal spread of coding error around transients.
Furthermore, like xHE-AAC, Low Delay AAC allows the use of two frame overlap widths—the default 50% overlap for stationary input, or a reduced overlap (similar to the short overlap of the transition transforms) for non-stationary signals. The reduced overlap effectively limits the time extension of a transform and, thus, its coding error in case of coefficient quantization.
U.S. patent 2008/0140428A 1 assigned to Samsung Electronics Co., as well as U.S. Pat. Nos. 5,502,789 and 5,819,214 assigned to Sony Corp., disclose signal-adaptive window or transform size determining units. However, the transformer units controlled by said window or transform size determining units operate on QMF or LOT sub-band values (implying that the described systems both employ cascaded filter-banks or transforms) as opposed to working directly on the time-domain full-band input signal as in the present case. Moreover, in 2008/0140428A 1 no details about the shape or control of the window overlap are described, and in 5819214 the overlap shapes follow—i.e., are the result of—output from the transform size determining unit, which is the opposite of what an embodiment of the current invention proposes.
U.S. patent 2010/0076754A1 assigned to France Telecom follows the same motivation as the present invention, namely being able to perform transform length switching in communication coding scenarios to improve coding of transient signal segments, and doing so without extra encoder look-ahead. However, whereas said document reveals that the low-delay objective is achieved by avoiding transform-length transition windows and by post-processing the reconstructed signal in the decoder (disadvantageously by amplification of parts of the decoded signal and thus the coding error), the present invention proposes a simple modification of the transition window of a conventional system to be introduced below, such that additional encoder look-ahead can be minimized and special (risky) decoder post-processing can be avoided.
The transition transform to which an inventive modification is to be applied is the start window described in two variants in U.S. Pat. No. 5,848,391 assigned to Fraunhofer-Gesellschaft e. V. and Dolby Laboratories Licensing Corp. as well as, in a slightly different form, in U.S. patent 2006/0122825A 1 assigned to Samsung Electronics Co. FIG. 16 shows these start windows and reveals that the difference between Fraunhofer/Dolby's windows and Samsung's window is the presence of a non-overlapping segment, i.e. a region of the window having a constant maximum value which does not belong to any overlap slope. The Fraunhofer/Dolby windows exhibit such a “non-overlapping part having a length”, the Samsung windows do not. It can be concluded that an encoder with the least amount of additional look-ahead but using conventional transform switching can be realized by employing Samsung's transition window approach. With such transforms, a look-ahead equal to the overlap width between the short transforms suffices to fully switch from long to short transforms early enough before a signal transient.
Further conventional technology can be found in WO 90/09063 or “Coding of audio signals with overlap block transform and adaptive window functions”, Frequenz, Band 43, September 1989, pages 2052 to 2056 or in AES Convention Paper 4929, “MPEG-4 Low Delay Audio Coding based on the AAC Codec”, E. Allamanche, et al., 106 Convention, 1999.
Nonetheless, depending on the length of the short transform the look-ahead can remain fairly large and should not be avoided. FIG. 17 illustrates the block switching performance during the worst-case input situation, namely the presence of a sudden transient at the start of the look-ahead region, which in turn begins at the end of the long slope, i.e. the overlap region between the frames. According to the prior-art approaches, at least one of the two depicted transients reaches into the transition transform. In a lossy coding system utilizing an encoder without additional look-ahead—an encoder which does not “see the transient coming”—this condition causes temporal spreading of the coding error up to the beginning of the long slope and, even when using TNS, pre-echo noise is thus likely to be audible in the decoded signal.
The two previously mentioned look-ahead work-arounds have their disadvantages. Reducing the long-transform overlap by a factor of up to 8 on the one hand, as done in the Celt coder, severely limits the efficiency (i.e. coding gain, spectral compaction) on stationary, especially highly tonal, input material. Prohibiting short transforms as in (Enhanced) Low Delay AAC, on the other hand, reduces codec performance on strong transients with durations of much less than the frame length, often leading to audible pre- or post-echo noise even when using TNS.
Thus, the conventional window sequence determination procedures are sub-optimum with respect to flexibility due to the restricted window lengths, are sub-optimum with respect to the necessitated delay due to the minimum necessitated transient look-ahead periods, are sub-optimum with respect to audio quality due to pre- and post-echoes, are sub-optimum with respect to efficiency due to potentially necessitated additional pre-processing using additional functionalities apart from windowing procedures with certain windows or are sub-optimum with respect to flexibility and efficiency due to the potential necessity of changing a frame/block raster in the presence of a transient.