Modern audio coding or decoding methods, which operate according to the standard MPEG layer 3 for example, are capable of compressing the data rate of audio signals by a factor of 12 for example without causing any noticeable deterioration in the quality of these signals. To obtain such a high data rate reduction an audio signal is sampled, resulting in a sequence of discrete-time samples. As is known in this branch of technology, this sequence of discrete-time samples is windowed using suitable window functions to obtain windowed blocks of temporal samples. A block of temporal windowed samples is then transformed into the frequency domain by means of a filter bank, a modified discrete cosine transform (MDCT) or some other suitable method to obtain spectral values which together represent the audio signal, i.e. the temporal section which consists of the block of discrete-time samples, in the frequency domain. Normally temporal blocks which overlap by 50% are generated and are transformed into the frequency domain by means of an MDCT. Because of the special properties of the MDCT, 1024 discrete-time samples for example always result in 1024 spectral values.
It is known that the receptivity of the human ear depends on the momentary spectrum of the audio signal itself. This dependence is reflected in the so-called psychoacoustic model. Using this model it has long been possible to calculate masking thresholds in dependence on the momentary spectrum. Masking means that a particular tone or spectral portion is rendered inaudible when e.g. a neighbouring spectral region has a relatively high energy. This phenomenon of masking is exploited so as to quantize the post-transform spectral values as coarsely as possible. The aim, therefore, is to avoid audible disturbances in the decoded audio signal while using as few bits as possible to code, or here to quantize, the audio signal. The disturbances introduced by quantization, i.e. the quantization noise, should lie below the masking threshold and thus be inaudible. In accordance with known methods the spectral values are therefore subdivided into so-called scale factor bands, which should reflect the frequency groups of the human ear. Spectral values in a scale factor group are multiplied by a scale factor so as to scale spectral values of a scale factor band as a whole. The scale factor bands scaled with the scale factor are then quantized, producing quantized spectral values. It is of course obvious that a grouping into scale factor bands is not essential. This procedure is, however, used in the standard MPEG layer 3 and in the standard MPEG-2 AAC (AAC=Advanced Audio Coding).
A very important aspect of data reduction is the entropy coding of the quantized spectral values resulting from quantization. A Huffman coding is normally used for this. A Huffman coding entails variable-length coding, i.e. the length of the code word for a value to be coded depends on the probability of this value occurring. As is logical the most probable symbol is assigned the shortest code, i.e. the shortest code word, so that very good redundancy reduction can be achieved with Huffman coding. An example of a universally known variable-length coding is the Morse alphabet.
In audio coding Huffman codes are used to code the quantized spectral values. A modern audio coder which operates e.g. according to the standard MPEG-2 AAC uses different Huffman code tables, which are assigned to the spectrum according to particular criteria on a sectional basis, to code the quantized spectral values. Here 2 or 4 spectral values are always coded together in one code word.
One way in which the method according to MPEG-2 AAC differs from the method MPEG layer 3 is that different scale factor bands, i.e. different spectral values, are grouped into an arbitrarily large number of spectral sections. In AAC a spectral section contains at least four spectral values, preferably more than four spectral values. The whole frequency range of the spectral values is thus divided up into adjacent sections, where one section represents a frequency band, so that all the sections together cover the whole frequency range which is spanned by the post-transform spectral values.
To achieve a maximum redundancy reduction, a so-called Huffman table, one of a number of such tables, is assigned to each section as in the MPEG layer 3 method. In the bit stream of the AAC method, which normally has 1024 spectral values, the Huffman code words for the spectral values are now in an ascending frequency sequence. The information on the table used in each frequency section is transmitted in the side information. This situation is shown in FIG. 2.
In the case chosen to serve as an example in FIG. 2 the bit stream comprises 10 Huffman code words. If one code word is always formed from one spectral value, 10 spectral values can then be coded here. Usually, however, 2 or 4 spectral values are always coded together in a code word, so that FIG. 2 represents a part of the coded bit stream comprising 20 or 40 spectral values. In the case where each Huffman code word comprises 2 spectral values, the code word referenced by the number 1 represents the first two spectral values. The length of this code word is relatively short, meaning that the values of the first two spectral values, i.e. of the two lowest frequency coefficients, occur relatively often. The code word with the number 2, on the other hand, is relatively long, meaning that the contributions of the third and fourth spectral coefficients in the coded audio signal are relatively infrequent, which is why they are coded with a relatively large number of bits. It can also be seen from FIG. 2 that the code words with the numbers 3, 4 and 5, which represent the spectral coefficients 5 and 6, 7 and 8, and 9 and 10, also occur relatively frequently, since the length of the individual code words is relatively short. Similar considerations apply to the code words with the numbers 6–10.
As has already been mentioned, it is clear from FIG. 2 that the Huffman code words for the coded spectral values are arranged in linearly ascending order in the bit stream from the point of view of the frequency in the case of a bit stream which is generated by a known coding device.
A big disadvantage of Huffman codes in the case of error-afflicted channels is the error propagation. If it is assumed e.g. that the code word number 2 in FIG. 2 is disturbed, there is a not insignificant probability that the length of this erroneous code word number 2 will also be changed. This thus differs from the correct length. If, in the example of FIG. 2, the length of the code word number 2 has been changed by a disturbance, it is no longer possible for a decoder to determine where the code words 3–10 start, i.e. almost the whole of the represented audio signal is affected. Thus all the other code words following the disturbed code word cannot be decoded properly either, since it is not known where these code words start and since a false starting point was chosen because of the error.
As a solution to the problem of error propagation European patent No. 0612156 proposes that some of the code words of variable length should be arranged in a raster and the other code words should be assigned to the remaining gaps so that the start of a code word can be more easily identified without complete decoding or in the event of a faulty transmission.
The known method provides a partial remedy for error propagation by resorting the code words. A fixed place in the bit stream is reserved for some code words and the spaces which are left can be occupied by the remaining code words. This entails no extra bits, but prevents error propagation among the resorted code words in the event of an error.
The decisive parameter for the efficiency of the known method is how the raster is defined in practice, i.e. how many raster points are needed, the raster distance between the raster points, etc. However, European patent 0612156 does not go beyond the general proposition that a raster should be used to curtail error propagation; there are no details as to how the raster should be efficiently structured so as to achieve error-tolerant, and at the same time efficient, coding.
EP-A-0 717 503 discloses a digital coding and decoding method in which discrete-time samples of a music signal are transformed into the frequency domain, whereupon the spectral values which are obtained are quantized and then entropy coded. The entropy coding delivers a certain number of code words of variable length, some of which are arranged in a raster while the others are inserted in the remaining spaces in the raster.
EP-A-0 492 537 relates to an information recording device for video and audio information in which information is divided up into small blocks of pixels, each containing a plurality of pixels, whereupon each small block is converted into orthogonal components by means of an orthogonal transformation. The orthogonal components are then coded using a code having code words of variable length. Some of the coded code words are written into a first memory. If a code word has more bits than are provided for by the first memory, the remaining bits of this code word are written into another memory.