In the subband coding strategy employed in the spectral expansion technologies, such as the SBR technology, it is important to properly segment a signal frame both in a time direction and in a frequency direction in order to prevent a problem that low-energy areas are forced to share the same average energy value as the large-energy areas. This would in turn lead to erroneous amplification at a decoder, which is a common source of audible artifacts.
An objective of audio coding is to transform a digitized audio stream into a compressed bitstream at an audio encoder, so that as high fidelity to original source as possible is retained after the bitstream is processed at the decoder. One popular way of compression is shown in FIG. 1, which shows a typical audio coding system including an encoder and a decoder. A module 1000 divides an audio signal in time domain into consecutive frames, a module 1010 transforms each frame of the audio signal into frequency domain, and a module 1020 quantizes a spectrum up to a certain frequency (known as a bandwidth) at the encoder. FIG. 2 is a typical time/frequency grid representation used in the audio coding. One possible way for the module 1010 to transform the audio signal into frequency domain is the time/frequency grid approach as shown in FIG. 2, where a filterbank is employed to split an audio signal into multiple subbands, each representing a portion of the signal within a narrow frequency range in time domain. At the decoder, the audio spectrum is de-quantized by the module 1030 and inversely transformed by the module 1040 back into audio frames. The audio frames are then appropriately assembled by the module 1050 to form a continuous audio stream.
As a bitrate (number of bits per second) of coding decreases, more sacrifice has to be made to the bandwidth of the audio signal to be transmitted by not coding the high-frequency portion, as it is deemed not as perceptually important as the low frequency portion. The consequence is that some high-frequency tones, and harmonics of the low-frequency tones are shut down. FIG. 3 is a graph illustrating limitation of bandwidth owing to bitrate consideration causes a loss of some high-frequency tones and harmonics. FIG. 3 illustrates the above band-limiting operation, where 2020 indicates the resultant bandwidth of the coded audio.
An objective of the bandwidth expansion is to recover the high-frequency portions, by coding them using very few additional bits. One example of such a technique is the Spectral Band Replication (SBR) method (disclosed in International Patent Publication No. WO98/57436), which is now an MPEG standard (ISO/IEC 14496-3, 2001 AMD1). FIG. 4 is a diagram illustrating a possible encoder of a subband coding scheme for the bandwidth expansion. FIG. 4 illustrates one possible encoder structure for the SBR method that is relevant to the present invention. Firstly, an audio signal is band-splitted into N subbands by N subband filters at an analysis filterbank 3010, each capturing a part of the signal's frequency spectrum. The N signals produced by the filters are decimated to remove redundancy. A bandwidth expansion coder 3020 extracts some information from the filter outputs so that at a decoder, the low-frequency subbands can use the information to expand the bandwidth of the audio signal. The bandwidth expansion information is then multiplexed at a bitstream multiplexer 3030 with the output of a core codec 3000 for encoding the audio signals of the low-frequency subbnads to form a bitstream. A nominal SBR frame consists of L outputs from each subband filter.
FIG. 5 is a diagram illustrating a decoder of the subband coding scheme for the bandwidth expansion. FIG. 5 illustrates the decoder for the SBR method that is relevant to the present invention. Firstly, a bitstream is de-multiplexed at 4000 to become a core audio bitstream and a bandwidth expansion bitstream. A core audio decoder 4010 decodes the core audio bitstream to produce a band-limited audio signal in time domain. The band-limited audio signal is then band-splitted into M subbands by M subband filters of an analysis filterbank 4020. Higher-frequency subbands are synthesized using the bandwidth expansion information at this subband level. The new higher-frequency subbands, as well as the lower-frequency subbands, are up-sampled and assembled by an N-filter synthesis filterbank 4040 to output a final bandwidth-expanded signal.
The output from the analysis filterbank 3010 can be viewed as the time/frequency grid representation of the audio signal as shown in FIG. 2. As a part of the bandwidth expansion information, the time/frequency grid representation is to be divided first in a time direction into ‘time segments’ and then in a frequency direction into ‘frequency bands’. For each frequency band, its average energy is computed, quantized and coded. This process is known as spectral envelope coding. More specifically, in the spectral envelope coding, the audio signal is represented by distribution of the average energy in each segment indicated two-dimensionally by a time axis and a frequency axis. FIG. 6 illustrates such a segmentation process, and is fully described in International Patent Publication No. WO01/26095A1. In FIG. 6, 5010 depicts segmentation in a time direction, and 5020 depicts segmentation in a frequency direction. At the decoder, data generated by this process is used to shape the energy of the synthesised high-frequency bands, so that it takes on the same energy envelope as the original audio signal. Without proper segmentation, low-energy areas would be forced to share the same average energy value as the large-energy areas. This would in turn lead to erroneous amplification at the decoder, which is a common source of audible artefacts.
Each SBR frame is partitioned in a time direction into time segments using ‘borders’. The prior art describes a method of using ‘fixed’ and ‘variable’ borders to achieve effective spectral envelope coding. FIG. 7 is a diagram showing border relationships between four frame types. Refer to FIG. 7, fixed borders 6060, 6070 and 6100 coincide with borders 6010, 6020 and 6050 of nominal SBR frames, whereas variable borders 6080 and 6090 of a current frame is allowed to encroach into the next nominal SBR frame. A start border and an end border of the ‘variable SBR frame’ can either be a fixed border or a variable border. If the start border and the end border are both fixed borders, the variable SBR frame coincides with the nominal SBR frame. The end border of the current SBR frame automatically becomes the start border of the next SBR frame.
Between the start border and end border, the SBR frame is further partitioned into several time segments by intermediate borders according to the prior art. If the start border and the end border are both fixed borders, the SBR frame is partitioned into uniform time segments. This is known as a FIXFIX frame in the prior art (i.e., a FIX border as the start border and a FIX border as the end border). FIG. 8 is a diagram showing the FIXFIX frame with fixed start and end borders. As shown in FIG. 8, 7010 is the start border and 7020 is the end border. If a threshold detector finds a transient region in the current SBR frame, its end border will become a ‘variable’ border that must be equal to or greater than the next nominal SBR frame.
FIG. 9 is a diagram showing a FIXVAR frame with a fixed start border, a variable end border greater than the nominal SBR frame border, and some intermediate borders specified relative to the end border or each other. The FIXVAR frame has a fixed border as the start border 8010 and a variable border as the end border 8050. Intermediate borders 8020, 8030 and 8040 are specified relative to one another or the variable border, where d0, d1, d2 and the like are relative border distances. According to FIG. 9, the first relative distance d0 must start with the variable border. Subsequent relative distances start with the previously determined intermediate borders.
Since the end border of the current SBR frame automatically becomes the start border of the next SBR frame, it is possible for an SBR frame to have two variable borders in case of transient behaviors in successive SBR frames. FIG. 10 is a diagram showing a VARVAR frame with a variable start border, a variable end border greater than the nominal SBR frame border, and some intermediate borders specified relative to the two variable borders or each other. For the VARVAR frame, the intermediate borders can be specified as relative to either one of the variable borders. In FIG. 10, an intermediate border 9020 is relative to the start border 9010, whereas intermediate borders 9030, 9040, and 9050 are relative to each other or the variable end border 9060.
Finally, if the transient detector cannot find any transient in the current SBR frame, but it begins with a variable border, it will still adopt a fixed border as its end border. This is a final frame class introduced in the prior art. FIG. 11 is a diagram showing a VARFIX frame with a variable start border, a fixed end border, and some intermediate borders specified relative to the start border or each other. In FIG. 11, 10010 is the variable start border and 10050 is the fixed end border. 10020, 10030 and 10040 constitute the intermediate borders progressively derived from d0, d1 and d2.
To reduce bit consumption, the relative border distances between the intermediate borders and the variable border can only take on a few pre-determined sizes.
After marking a plurality of time segments with the above-described borders, each time segment, partitioned by two borders, is to be divided in a frequency direction into frequency bands. Exact spectral borders are derived using criteria that are irrelevant to the present invention. FIGS. 12A and 12B are diagrams showing border relationships between high-resolution time segments and low-resolution time segments. FIGS. 12A and 12B show the border relationship between a high-resolution division and a low-resolution division which are two possible resolutions. Borders of the low-resolution divisions are alternate borders of the high-resolution division.