Modern audio encoding or decoding methods which work by the MPEG layer 3 standard, for example, are capable of compressing the data rate of audio signals, e.g. by a factor 12, without noticeably degrading the quality thereof. In order to achieve such a high data rate reduction, an audio signal is sampled, whereby a sequence of discrete-time samples is obtained. As is known in the art, the sequence of discrete-time samples is windowed in order to obtain windowed blocks of time samples. A block of time-windowed samples is then transformed to the frequency range by means of a filter bank, a modified discrete cosine transform (MDCT) or other suitable device, in order to obtain spectral values which, as a whole, represent the audio signal, i.e. the time section determined by the block of discrete-time samples, in the frequency range. Usually, time blocks which overlap at 50% are produced and transformed to the frequency range by means of a MDCT whereby, due to the specific properties of the MDCT, 1024 discrete-time samples, for example, always lead to 1024 spectral values.
It is known that the receptivity of the human ear depends on the momentary spectrum of the audio signal itself. This dependency is covered in the so-called psycho-acoustic model by means of which it has been possible for quite some time to calculate masking thresholds depending on the momentary spectrum. Masking means that a specific tone or a spectral component is hidden in case an adjacent spectral range, for example, has relatively high energy. This fact of masking is utilized in order to quantize as closely as possible the spectral values present after the transformation. The aim is therefore to avoid audible interferences in the re-decoded audio signal on the one hand and to use as few bits as possible on the other hand in order to encode or, in this case, to quantize the audio signal. The interferences introduced by quantization, i.e. quantization noise, are intended to be below the masking threshold and, therefore, to be inaudible. In accordance with known methods, a classification of the spectral values in so-called scale factor bands is carried out, which should correspond to the critical bands, i.e. frequency groups, of the human ear. Spectral values in a scale factor group are multiplied by a scale factor in order to carry out overall scaling of spectral values of a scale factor band. The scale factor bands scaled by the scale factor are then quantized, whereupon quantized spectral values are produced. It is understood that grouping in scale factor bands is not critical. However, it is used in the MPEG layer 3 standards or in the MPEG 2 AAC standard (AAC=advanced audio coding).
A very essential aspect of data reduction lies in entropy encoding of the quantized spectral values, which follows quantizing. Huffman encoding is usually used for entropy encoding. A Huffman coding is understood to mean a coding with a variable length, i.e. the length of the code word for a value to be encoded is dependent on the probability of occurrence thereof. Logically, the most probable character is assigned the shortest code, i.e. the shortest code word, so that very good redundancy reduction can be achieved by means of Huffman encoding. An example for a generally known coding with a general length is the Morse code.
In the field of audio encoding, Huffman codes are used for encoding the quantized spectral values. A modern audio encoder, which works, for example, in accordance with the MPEG 2 AAC standard, uses different Huffman code tables for encoding the quantized spectral values, which Huffman code tables are assigned to the spectrum by certain criteria on a section-by-section basis. In this process, 2 or 4 spectral values are always encoded together in one code word.
One difference between the method in accordance with MPEG 2 AAC and the method in accordance with MPEG layer 3 is that different scale factor bands, i.e. different spectral values, are grouped into any number of spectral sections. With AAC, one spectral section includes at least four spectral values, but preferably more than four spectral values. The entire frequency range of the spectral values is therefore divided up into adjacent sections, with one section representing one frequency band such that all sections together cover the entire frequency range, which is superimposed by the spectral values after the transformation thereof.
As in the MPEG layer 3 method, one section is assigned to a so-called “Huffman table” from a plurality of such tables in order to achieve a maximum redundancy reduction. In the bit stream of the AAC method, which usually comprises 1024 spectral values, are now the Huffman code words for the spectral values in an ascending order of frequencies. The information on the table used in each frequency section is transferred in the side information. This situation is shown in FIG. 6.
FIG. 6 shows the exemplary case where the bit stream includes 10 Huffman code words. In case one code word is always formed from one spectral value, 10 spectral values may be encoded here. However, usually 2 or 4 spectral values are always jointly encoded by one code word, which is why FIG. 6 shows a part of the encoded bit stream which includes 20 or 40 spectral values. In the case where each Huffman code word includes 2 spectral values, the code word designated by No. 1 represents the first two spectral values, with the length of code word No. 1 being relatively short, which means that the values of the first two spectral values, i.e. of the two smallest frequency coefficients, occur relatively frequently. The code word bearing the No. 2, however, has a relatively long length, which means that the amounts of the 3rd and 4th spectral coefficients in the encoded audio signal are relatively rare, which is why they are encoded with a relatively large amount of bits. Further, it is apparent from FIG. 6 that the code words with the numbers 3, 4 and 5, which represent the spectral coefficients 5 and 6 or 7 and 8 or 9 and 10, also occur relatively frequently, since the length of the individual code words is relatively small. The same applies to the code words bearing the numbers 6 to 10.
As has already been mentioned, it is clearly apparent from FIG. 6 that the Huffman code words for the encoded spectral values are arranged in the bit stream in a linearly ascending manner with regard to the frequency in case a bit stream which is produced by a known encoding apparatus is considered.
One major drawback with regard to Huffman codes, in the case of faulty channels, is error propagation. It may be assumed, for example, that code word No. 2 in FIG. 6 is interfered with. There is a certain, not low, probability that the length of this wrong code word No. 2 is also modified. It therefore is different from the correct length. In case, in the example of FIG. 6, code word No. 2 has been modified in its length due to an interference, it is no longer possible for an encoder to determine the starts of the code words 3 to 10, i.e. of almost the entire audio signal represented. This means that all other code words following the code word which has been interfered with can no longer be correctly encoded, since it is not known where these code words start, and since an incorrect starting point was selected due to the error.
As a solution to the problem of error propagation, European Patent No. 0 612 156 proposes that a part of the code words of variable lengths be arranged in a raster and that the remaining code words be distributed in the remaining gaps, so that the start of a code word which is arranged at a raster point can be more easily found without full decoding or in the case of an incorrect transmission.
It is true that the known method provides some remedy for error propagation by means of rearranging the code words. For some code words, a fixed location in the bit stream is agreed upon, whereas the remaining gaps are available for the remaining code words. This does not cost any additional bits, but prevents, in the case of an error, error propagation among the rearranged code words.
German Patent Application 19 747 119.6-31, which was published after the filing date of the present application, proposes that not just any code words be located at raster points, but that code words which are significant from a psycho-acoustic point of view, i.e. code words for spectral values which make a significant contribution to the audio signal, be located at raster points. A data stream with code words of variable lengths, such as is produced by such an encoder, is shown in FIG. 5. As in FIG. 6, the data stream also includes 10 code words, with the priority code words being shaded. The first priority code word is located such as to start at a first rater point 100, the second priority code word is located such as to start at a second raster point 101, the third priority code word is located such as to start at a third raster point 102, the fourth priority code word is located such as to start at a fourth raster point 103 and the fifth priority code word is located such as to start at a fifth raster point 104. A first segment 105 is defined by the raster points 100 and 101. Similarly, a second 106, a third 107, a fourth 108 and a final segment 109 are defined. It is shown in FIG. 5 that the first two segments 105 and 106 have a different length from the two segments 107 and 108 and yet a different length from the final segment 109. Non-priority code words 6, 7, 8, 9 and 10 are then entered in the data stream following the priority code words such that the latter is filled up, so to speak. As is shown in FIG. 5, in the post-published method, the non-priority code words are consecutively inserted in the raster after the priority code words have been written. Specifically, the non-priority code word No. 6 is entered following the non-priority code word 1. The space still remaining in the segment 105 is filled up with the following non-priority code word 7, with the remainder of the non-priority code word 7, i.e. 7b, being written in the next free space, i.e. in the segment 107, directly following the priority code word 3. The same procedure is followed for the non-priority code words 8 to 10.
The advantage of the post-published method illustrated in FIG. 5 is that the priority code words 1 to 5 are protected against error propagation, since their starting points coincide with raster points and are therefore known.
In case, for example, the priority code word 2 has been damaged in transmission, it is very likely in the prior art shown in FIG. 6 that a decoder will not be able to decode any of the remaining code words 3 to 10 correctly. In the method shown in FIG. 5, however, the next code word, i.e. priority code word 3, starts at the raster point 102 such that the decoder will, at any rate, find the correct start of code word 3. Therefore, in the method shown in FIG. 5, no sequence error whatsoever will occur, and only priority code word No. 2 will be damaged. Consequently, this method provides effective protection for priority code words which are located at raster points.
However, there is no effective protection for non-priority code words. Referring to FIG. 5, damaging the non-priority code word No. 6 such that the decoder assumes, as an incorrect code word No. 6, a code word which is one bit shorter, will result in the fact that it is also no longer possible to correctly decode code word No. 7, since the last bit of the correct code word No. 6 is interpreted as being the start of the next code word No. 7. Therefore, an error in code word No. 6 will lead to the fact that, at a very high probability, it will no longer be possible, due to a sequence error, to correctly decode any code words following it even in case they have not been adversely affected by a transmission error.
DE 691 26 565 T2 relates to a method for transmitting codes of variable lengths. By this method, a data stream is produced in which, starting from the start of the data stream, code words of variable lengths are written in a first direction up to a certain point in the data stream. However, in order to increase error robustness, not the entire data stream is written in one direction, but merely up to the predetermined point. From the end of the data stream, the remainder of the code words of variable lengths is then written in an opposite direction of writing up to the predetermined point, so that a data stream results whose first half comprises code words which are written in the forward direction and whose second half comprises code words which are written in the backward direction.
U.S. Pat. No. 5,852,469 relates to encoding and decoding systems for code words with variable lengths and code words with specified lengths. It is provided, for code words with specified lengths, to provide synchronous positions in the data stream whose distance is equal to the length of the code words of specified lengths. The code words are then entered into the data stream such that they all start at a synchronous position. For code words of variable lengths, a data stream with a start and an end, however without synchronous positions, is provided in order to enter code words of variable lengths in the forward direction, starting from the start of the data stream up to a certain position behind the center of the data stream. Starting from the end of the data stream up to the predetermined position in the center, code words of variable lengths are then entered in the opposite direction of writing.