The invention relates to coding and decoding of stereo audio spectral values, and particularly to indication of the fact that stereo intensity coding is active.
The most advanced audio coding and decoding processes, operating e.g. to the MPEG Layer 3 standard, can compress the data rate of digital audio signals e.g. by a factor of twelve without markedly lowering their quality.
Apart from a great coding gain in the individual channels, e.g. the left channel L and right channel R, the relative redundancy and irrelevance of the two channels are also utilised in the case of stereo. The known methods which have already been used are the so-called MS stereo process (MS=centre-side) and the intensity stereo process (IS process).
The MS stereo process, which is known in the art, substantially utilises the relative redundancy of the two channels, with a sum of the two channels and a difference between them being to calculated, then transmitted as modified channel data for the left and right channel respectively. That is to say, the MS stereo process: has a precisely reconstructing action.
Unlike the MS stereo process, the intensity stereo process chiefly makes use of stereo irrelevance. It should be mentioned in connection with stereo irrelevance that the spatial perception of the human hearing system depends on the frequency of the audio signals perceived. At low frequencies both magnitude information and phase information in the two stereo signals is evaluated by the human hearing system, and perception of high frequency components is based mainly on analysis of the energy-time envelopes of both channels. Thus the exact phase information in the signals in both channels is not relevant to spatial perception. This feature of human hearing is utilised to make use of the stereo-irrelevance for further data reduction of audio signals by the intensity stereo process.
As the stereo intensity process cannot resolve precise local information at high frequencies, it is possible to transmit a joint energy envelope for both channels instead of two separate stereo channels L, R, from an intensity frequency limit defined in the encoder. In addition to the joint energy envelope roughly quantised direction information is also transmitted as side information.
As a channel is only partially transmitted when intensity stereo coding is used, the saving of bits may be up to 50%. It should be noted however that the IS process does not have a precisely reconstructing action in the decoder.
In the IS process hitherto employed in the MPEG standard, Layer 3, the fact that the IS process is active in a block of stereo-audio spectral values is indicated by a so-called mode_extension_bit, and each block has a mode_extension_bit assigned to it.
A theoretical representation of the known IS process is given in FIG. 1. Stereo-audio spectral values for a channel L 10 and a channel R 12 are totalled at a summation point 14 to obtain an energy envelope I=Li+Ri for the two channels. Li and Ri here represent the stereo-audio spectral values of the respective channels L and R in any scale factor band. As already mentioned, use of the IS process is only permissible above a certain IS frequency limit, in order to avoid inserting coding errors into the stereo-audio spectral values coded. The left and right channels therefore have to be coded separately within a range from 0 Hz to the IS frequency limit. The IS frequency limit as such is determined in a separate algorithm which does not form part of the invention. From this frequency limit upwards the encoder codes the total signal of the left channel 10 and right channel 12, formed at the summation point 14.
Scaling information 16 for channel L and scaling information 18 for channel R are necessary for decoding in addition to the energy envelope, i.e. the total signal of the left and right channels, which may e.g. be transmitted in the coded left channel. Scale factors for the left and right channels are transmitted in the intensity stereo process as implemented e.g. in MPEG Layer 2. However it should be mentioned here that, in the IS process in MPEG Layer 3 for IS-coded stereo-audio spectral values, intensity direction information is transmitted only in the right channel, and the spectral values are decoded again with this information as explained below.
The scaling information 16 and 18 is transmitted as side information in addition to the coded spectral values of channel L and channel R. A decoder delivers audio signal values decoded in a decoded channel Lxe2x80x2 20 and a decoded channel Rxe2x80x2 22, and the scaling information 16 for channel R and 18 for channel L is multiplied by the decoded stereo-audio spectral values for the respective channels in an L multiplier 24 and an R multiplier 26, as a means of decoding the originally coded stereo-audio spectral values.
Before IS coding is applied above a certain IS frequency limit or MS coding below that limit the stereo audio spectral values for each channel are grouped into so-called scale factor bands. The bands are adapted to the perception properties of the hearing system. Each band may be amplified with an additional factor, the so-called scale factor, which is transmitted as side information for the particular channel and which constitutes part of the scaling information 16 and 18 in FIG. 1. These factors are responsible for the formation of an interfering noise which is introduced by quantisation, in such a way that it is xe2x80x9cmaskedxe2x80x9d in respect of psycho-acoustic aspects and thus becomes inaudible.
FIG. 2a shows a format of the coded right channel R, used e.g. in an MPEG Layer 3 audio coding process. Any further mention of intensity stereo coding will relate to the MPEG layer 3 standard process. The individual scale factor bands 28, into which the stereo audio spectral values are grouped, are shown diagrammatically in the first line of FIG. 2a. In FIG. 2a these bands are shown equal in width purely for clarity; in practice their widths will not be equal, owing to the psycho-acoustic properties of the hearing system.
The second line of FIG. 2a contains coded stereo audio spectral values sp, which are non-zero below an IS frequency limit 32; the stereo audio spectral values in the right channel above the IS frequency limit are set to zero (zero_part) nsp, as already mentioned (nsp=zero spectrum).
The third line of FIG. 2 contains part of the side information 34 for the right channel. The part of the information 34 shown firstly comprises the scale factors skf for the range below the IS frequency limit 32 and the direction information rinfo 36 for the range above the frequency limit. The direction information is used to ensure rough local resolution of the IS coded frequency range in the intensity stereo process. Thus the direction information rinfo 36, also referred to as intensity positions (is_pos), is transmitted in the fight channel instead of the scale factors. It should be mentioned again that the scale factors 34 corresponding to the scale factor bands 28 are still present in the right channel below the IS frequency limit. The intensity positions 36 indicate the perceived stereo imaging position (the ratio of left to right) of the signal source within the respective scale factor bands 28. In each band 28 above the IS frequency limit the decoded values of the stereo audio spectral values transmitted are scaled by the MPEG Layer 3 process, with the following scaling factors kL for the left channel and kR for the right one:
kL=is_ratio/(1+is_ratio)xe2x80x83xe2x80x83(1)
and
kR=1/(1+is_ratio)xe2x80x83xe2x80x83(2)
The equation for is_ratio is as follows:
is_ratio=tan (is_posxc2x7Π/12)xe2x80x83xe2x80x83(3)
The value is_pos is quantised with 3 bits, only the values from 0 to 6 being valid position values. The left and right channels can be derived from the I signal (I=Li+Ri) in the following two equations:
Ri=Ixc2x7is_ratio/(1+is_ratio)=Ixc2x7kLxe2x80x83xe2x80x83(4)
Li=Ixc2x71/(1+is_ratio)=Ixc2x7kRxe2x80x83xe2x80x83(5)
Ri and Li are the intensity stereo decoded stereo audio spectral values. It should be mentioned here that the left channel format is analogous to the right channel format shown in FIG. 2a, although the combined spectrum I=Li+Ri rather than the zero spectrum is to be found above the IS frequency limit 32 in the left channel, and although ordinary scale factors are present rather than direction information is_pos for the left channel. The transition from the quantised total spectral values of non-zero to the zero values in the right channel can implicitly indicate the IS frequency limit to the decoder in MPEG Layer 3 standard.
The transmitted channel L is thus calculated in the encoder as the sum of the left and right channels, and the direction information transmitted may be defined by the following equation:
is_pos=nint [arctan (EL/ER).12/Π]xe2x80x83xe2x80x83(6)
The nint[x] function represents the xe2x80x9cnext whole numberxe2x80x9d function, EL and ER being the energy in the respective scale factor bands of the left and right channels. This formulation of the encoder/decoder gives an approximate reconstruction of signals in the left and right channels.
As already mentioned, in known audio coding processes the stereo audio spectral values are grouped into the scale factor bands, the bands being adapted to the perception properties of the hearing system. In the audio coding process to the MPEG Layer 3 standard these bands are divided into exactly three regions, the purpose being to group ranges with the same signal statistics. This is advantageous for, redundancy reduction by means of the known Huffman coding, which now takes place. For each of these regions of scale factor bands 28 one table is selected from a plurality of Huffman tables, where there is the greatest gain from redundancy reduction through Huffman coding by means of the selected Huffman table. The table is indicated in the bit stream of coded data by means of a 5-bit value for each region. There are 30 different tables, tables 4 and 14 being blank.
The non-backward compatible NBC coding process, which is currently being standardised, differs from the MPEG Layer 3 standard audio coding process inter alia, not only in the fact that exactly three regions of scale factor bands are allowed in the bit stream syntax for that process, but in the fact that any number of so-called xe2x80x9csectionsxe2x80x9d may be present and may have any number of scale factor bands. By analogy with the previously described process in MPEG Layer 3, a section has an appropriate Huffman table out of a plurality of such tables allocated to it in order to obtain maximum redundancy reduction, and that table will then be used for decoding. In an extreme case a section may e.g. comprise only one scale factor band. However this is unlikely to happen in practice, as far too much side information would then be required. In the NBC process there are altogether 16 Huffman code book numbers, which are transmitted as 4-bit values. Thus one of the twelve existing code book numbers can be selected.
The problem of the invention is to provide methods of coding and decoding stereo audio spectral values, where information relevant to coding and decoding is indicated with minimum use of side information.
In accordance with a first aspect of the present invention, this problem is solved by a method of coding stereo audio spectral values, comprising the following steps: grouping the stereo audio spectral values in scale factor bands with which scale factors are associated; forming sections, each comprising at least one scale factor band; coding the stereo audio spectral values within at least one section with a code book, allocated to the at least one section, out of a plurality of code books to each of which a number is assigned, the number of the code book used being transmitted as side information to the coded stereo audio spectral values, wherein at least one additional code book number is provided, which does not refer to a code book but shows information relevant to the section to which it is assigned, and one section has either a code book number or the at least one additional code book number assigned to it, without affecting the amount of side information.
In accordance with a second aspect of the present invention, this problem is solved by a method of decoding coded stereo audio spectral values which have side information, comprising the following steps: detecting a code book number on the basis of the side information for each section of the coded stereo audio spectral values; decoding the stereo audio spectral values of a section, the code book number of which refers to a corresponding code book, using that table; and decoding the stereo audio spectral values of another section with a code book number which does not refer to a code book but shows information relevant to the section to which it is assigned, in accordance with the information shown.
The invention is based on the realization that additional code book numbers which are not used to refer to code books may indicate other information relevant to a section. The xe2x80x9cadditionalxe2x80x9d code book numbers are the numbers which do not refer to code books. By 4-bit coding twelve different code book numbers, the numbers 13, 14 and 15 become to some extent freely available to contain other information. In a preferred embodiment of the invention two (no. 14 and no. 15) of the three (no. 13, no. 14 and no. 15) additional code book numbers are used to refer, firstly to intensity coding present in a section, and secondly to the mutual phase position of IS-coded stereo audio spectral values in two stereo channels.
The as yet unused additional code book number 13 may be used to refer to an adaptive Huffman coding.