The present invention relates to a conception for entropy encoding and to a corresponding conception for decoding entropy-encoded information words. In particular, the present invention relates to error-safe entropy encoding and corresponding decoding of audio signals.
Modern audio coding and decoding methods, respectively, operating for example according to the standard MPEG layer 3, are capable of compressing the data rate of audio signals e.g. by a factor of 12, without notably deteriorating the quality thereof. For obtaining such high data reduction, an audio signal is sampled, thereby obtaining a sequence of time-discrete sampling values. As is known in technology, this sequence of time-discrete sampling values is windowed by means of suitable window functions, so as to obtain windowed blocks of time sampled values. A block of time windowed sampling values then is transformed to the frequency domain by means of a filter bank, a modified discrete cosine transform (MDCT) or another suitable means, for obtaining spectral values which in total represent the audio signal, i.e. the time window established by the block of time-discrete sampling values, in the frequency domain. Usually, time blocks with an overlap of 50% are produced and transformed to the frequency domain by means of a MDCT so that, due to the specific properties of the MDCT, for example 1024 time-discrete sampling values always result in 1024 spectral values.
It is known that the receptivity of the human ear is dependent on the instantaneous spectrum of the audio signal itself. This dependency is noted in the so-called psychoacoustic model, by means of which it has been possible for quite some time to calculate masking thresholds in accordance with the instantaneous spectrum. Masking means that a specific sound or spectral component is concealed, for example, if an adjacent spectral region is of relatively high energy. This fact of masking is exploited for quantizing the spectral values present after the transform as roughly as possible. Therefore, endeavors are being made to avoid audible disturbances in the again decoded audio signal on the one hand and to utilize as few bits as possible for coding, or in the instant case quantizing, the audio signal on the other hand. The disturbances introduced by quantization, i.e. the quantization noise, should be below the masking threshold and thus should be inaudible. In accordance with known methods, a classification of the spectral values to so-called scale factor bands is carried out, which are supposed to correspond to the frequency groups of the human ear. Spectral values within a spectral value group are multiplied by a scale factor in order to scale spectral values of a scale factor band in total. The scale factor bands scaled by the scale factor then are quantized, whereupon quantized spectral values are formed. Of course, grouping into scale factor bands is not decisive. However, it is employed in the standard MPEG layer 3 and the standard MPEG-2 AAC (AAC=Advanced Audio Coding).
A very essential aspect of data reduction consists in entropy encoding of the quantized spectral values, which takes places after quantizing. For entropy encoding, Huffman coding is usually employed. Huffman coding is understood to be a variable length coding, i.e. the length of the code word for a value to be coded is dependent on the occurrence probability thereof. The most probable symbol logically has the shortest code, i.e. the shortest code employed is an unsymmetrical fixed length code, into which a symmetrical variable length code is mixed such that a specific number of bits of a fixed length code word is followed by a bit of a symmetrical variable length code word. The symmetrical variable length code words merely serve to provide for error robustness and do not carry useful information. On the receiver side, the symmetrical variable length code words are first extracted and analyzed with respect to transmission errors.
What is disadvantageous with respect to this mixed code is the fact that it is not possible to ascertain errors occurring in the fixed length code words, as only the symmetrical variable length code words are examined. On the other hand, disturbance-free fixed length code words can be identified as being error-inflicted if the associated variable length code words contain disturbances.
U.S. Pat. No. 5,488,616 is concerned with a system for providing reversible variable length codes. To this end, an asymmetrical reversible code is produced from a non-reversible variable length code, which is produced in provisional manner only. The non-reversible variable length code furthermore is converted to a symmetrical reversible code. A selection means selects either the asymmetrical reversible code or the symmetrical reversible code as output signal. The symmetrical reversible code is represented by a complete code tree in which all branches are concluded either by symmetrical code words or by branching points, with these branching points in turn being concluded by a symmetrical code word or leading to further branching points. The code tree contains exclusively symmetrical code words.
EP 0 732 855 A2 discloses a system of coding and/or decoding video images using variable length code words. The coder comprises a first coder having a code word table for source symbols in a region of source symbols, with this code table containing variable length code words assigned to source symbols. The source symbols that can be coded by variable length code words of the first code table have a relatively high probability of occurrence. A source symbol for which there is no code word from the first code table is input to a second coder having a code table with fixed length code words, in order to assign a fixed length code word to the source symbol. In addition thereto, an escape code is placed upstream and downstream of the fixed length code word, with said escape code being taken from the code table of the first coder having variable length code words. The variable length code words of the first coder are reversible code words, whereas the code words of the second coder are of fixed length. This produces a single data stream consisting of reversible variable length code words and of escape codes, with a fixed length code word being arranged between two escape codes each. This data stream can be decoded both in forward and in backward direction, with a decoder, upon coming across an escape code, recognizing the group of bits following the escape code, as a fixed length cod e word, since the decoder has information on the number of bits in the group, i.e. on the length of the fixed length code words.
It is the object of the present invention to make available a conception for entropy encoding of information words and for decoding entropy-encoded information words which permits improved error recognition in case of transmission of the entropy-encoded information words via an error-inflicted channel while nevertheless providing for an optimum possible coding efficiency.
This object is met by a device for entropy encoding according to claim 1, by a device for decoding according to claim 10, by a method for entropy encoding according to claim 19 and by a method for decoding according to claim 20.
The present invention is based on the finding that only those information words can be transmitted in effectively error-robust manner which are coded by reversible, e.g. symmetrical code words. Only reversible code words permit forward and backward coding of a sequence of code words that is unequivocally associated with a sequence of information words. In contrast to the Huffman code, which has unsymmetrical code words, but is nearly optimum for reasons of data compression, a symmetrical code has higher redundancy. This redundancy can be advantageously utilized for error recognition. However, in order to not sacrifice too much compression gain for obtaining error-safeness, not all information words are coded by means of symmetrical code words according to the present invention, but only those information words that are within a specific region of information words. Information words lying outside the region are not coded by means of the symmetrical code, but can be Huffman-coded according to a preferred embodiment of the present invention. Thus, a compromise is made between error-robustness on the one hand and data compression on the other hand.
Another important aspect for the size of the region of information words coded by symmetrical code words is the fact that a short code, i.e. a small code table, is desirable for error localization. The size of the region implicitly determines the length of the longest code word, since with increasing number of code words in the table the length of the valid code words increases as well.
Error localization, according to the invention, is carried out in that a decoder recognizes invalid, i.e. non-reversible, code words and concludes therefrom that a transmission . . . [ . . . error is present here, as such a code word by definition was not produced in the coder.]
The present invention is based on the finding that only those information words can be transmitted in effectively error-robust manner which are coded by reversible, e.g. symmetrical code words. Only reversible code words permit forward and backward coding of a sequence of code words that is unequivocally associated with a sequence of information words. In contrast to the Huffman code, which has unsymmetrical code words, but is nearly optimum for reasons of data compression, a symmetrical code has higher redundancy. This redundancy can be advantageously utilized for error recognition. However, in order to not sacrifice too much compression gain for obtaining error-safeness, not all information words are coded by means of symmetrical code words according to the present invention, but only those information words that are within a specific region of information words. Information words lying outside the region are not coded by means of the symmetrical code, but can be Huffman-coded according to a preferred embodiment of the present invention. Thus, a compromise is made between error-robustness on the one hand and data compression on the other hand.
Another important aspect for the size of the region of information words coded by symmetrical code words is the fact that a short code, i.e. a small code table, is desirable for error localization. The size of the region implicitly determines the length of the longest code word, since with increasing number of code words in the table the length of the valid code words increases as well.
Error localization, according to the invention, is carried out in that a decoder recognizes invalid, i.e. non-reversible, code words and concludes therefrom that a transmission error is present here, as such a code word by definition was not produced in the coder. The probability that a disturbance leads to an invalid code word, is highest when there is just a small number of code words present. If a very large number of code words exists, the probability of a disturbance resulting in an invalid code word becomes increasingly smaller since the length of the invalid code words becomes increasingly longer as well.
The method according to the invention is advantageous in particular in such cases in which the information words to be coded are substantially within a region, and information words are outside this region with little probability only. The smaller this region, the fewer symmetrical code words are necessary and the better the error detection, which could be increased by the addition of artificial invalid code words. Thus, it is attempted to select the region of the information words coded by symmetrical code words as small as possible in the sense of efficient error localization, but to select it nevertheless so large that the information words are within this region with great probability and are coded symmetrically, in order to provide for an in total sufficient error robustness.
A preferred use of the present invention consists in entropy encoding of scale factors of a transformation-encoded audio signal, since with this use, seen statistically, 98% of the scale factor values occurring are within a graspable region that can be coded by symmetrical code words that are not yet of excessive length. If an information word outside of this region is to be entropy-encoded, an additional value is transmitted which is referred to as xe2x80x9cescapexe2x80x9d. The escape value preferably is Huffman coded and transmitted separately from the symmetrically coded scale factors in the audio bit stream.
The sense of entropy encoding according to the invention thus consists in being able, despite a relatively small RVLC table, to cover a large region of code words with good error recognition properties. The coding efficiency hardly suffers in the preferred application mentioned, since escape-encoded values occur only rarely there.
The application of the present invention to the scale factors of a transformation-encoded audio signal is advantageous in particular as already smaller disturbances in the scale factors due to a non-ideal channel lead to strongly audible disturbances since, as is known, a scale factor weights several spectral lines in multiplicative manner. Since, furthermore, the scale factor, as compared to the coded spectral values, make up a relatively small part of the entire bit quantity only, protection of the scale factors by a redundant code does not result in a considerable additional expenditure of bits. This slight additional expenditure is more than justified by the error-safeness of the scale factors which, as compared to their bit quantity, may introduce by far higher disturbances into an audio signal.
However, the present invention is not restricted to entropy encoding and decoding of scale factors, but is advantageous in all situations where information words are to be coded which are within a region with high probability, such that one can make do with relatively short symmetrical code words without great loss in efficiency, and in which values outside said region can be encoded by escape sequences.