In the transmission of image and video data across data networks, compression of image data is needed, because the quantity of data to be transmitted is extremely large in view of the transmission rates available. Known compression algorithms include lossless as well as lossy algorithms. The discrete cosine transform (DCT) is a lossy algorithm in combination with quantization, to cut redundant information from the image data. The DCT factors are computed for basic image units (for example, 8×8 pixels), and an image block is represented as a matrix consisting of these factors. After the DCT, the factors are arranged by a zig-zag scanning to a suitable order for lossless Huffman coding. The Huffman coding is only an example of a variable length coding (VLC) which reduces the statistical redundancy after the DCT and the quantization. The principle of variable length coding is to present frequently occurring symbols with shorter codewords and less frequently occurring symbols with longer codewords. To begin the coding, the source data is processed by arranging the respective symbols according to the probability of their occurrence. The symbols with the lowest probability are combined recursively into one symbol, wherein a tree structure is formed (example shown in FIG. 1a) to allocate each symbol its own codeword by defining a bit to represent each branch. It can be seen from the code tree of FIG. 1a that the symbol with the lowest probability of occurrence has the longest codeword. Table 1b shows the probability of occurrence P(x), the codeword (code) and the enthropy H(x) for each symbol. The method results in the shortest codeword with the average value, and the compression is very close to the enthropy of the original source; therefore, the coding becomes efficient in general. Consequently, the variable length coding is an essential element in many standards relating to image processing.
As indicated by their definition, the codewords of variable length coding vary in length. If the starting and ending points of the codeword cannot be directly detected from the bit stream, a recursive dependency will develop between the codewords, which makes the decoding more difficult. Due to this dependency, it is necessary that before the variable length decoding (VLD), the codewords must be found from the bit stream, that is, their locations must be determined. In the decoding process, it is thus necessary to determine both the code length and the location, the codeword and the corresponding symbol.
The VLD implementations can be characterized according to the processing of the bit stream in the following way. In bit serial processing, the bit stream is processed bit by bit in series. Thus, the overall incoming bit stream is processed at a constant input rate. As the codewords have variable lengths, the overall decoding of the codeword also takes a variable time, resulting in a variable output rate.
In bit parallel processing, several bits of the incoming bit stream are processed in parallel, either at a constant input rate or at a variable input rate. When the input rate is kept constant, the output rate remains variable, because of the variable length of the codeword. To achieve a constant output rate, variable processing of the incoming bit stream is required. In other words, to secure a constant output rate, the input must contain a number of bits corresponding to the product of the desired output rate and the longest codeword length.
In a way similar to the characterization based on the processing of the bit stream, the parallelism of the decoders can also be described with the number of codewords to be decoded (or symbols obtained from the output) simultaneously. When the decoding is performed with one codeword at a time, the decoding can be characterized as symbol-serial. Correspondingly, when, as as result of the decoding, several symbols are achieved at a time, the decoding is symbol-parallel.
The decoding can be implemented with a tree-based algorithm, which is an implementation opposite to the Huffman tree structure: the encoded incoming bit stream is compared with the binary tree, starting from the root and executing it as long as the whole codeword has been determined from the respective leaf cell. In the decoding of the tree model, efficient decoding is only obtained with short codewords. However, high demands on processing in real time require that a corresponding output rate should also be achieved with long codewords. Furthermore, because of its bit-serial form, the tree model is not suitable for the simultaneous decoding of several symbols, i.e. for symbol-parallel processing, because of the dependency between the codewords when a single bit stream is processed.
The parallelism has been increased by processing several bits in parallel to improve the processing capacity, which is implemented by collecting a sufficient quantity (the maximum codeword length) of data in an input buffer, from whose beginning the codewords are detected and decoded. In symbol-serial decoding, a quantity of data corresponding to the maximum code length is accumulated in the input buffer, to secure that a symbol is found at each cycle. The problem here is that bits are left over at the end of the buffer, which cannot be utilized at the current cycle. Symbol-parallel decoding (or multi-symbol decoding) involves the problem and the challenge for design that the apparatus or system will rapidly become very complex.
C. T. Hsieh and S. P. Kim in “A concurrent memory-efficient VLC decoder for MPEG applications”, IEEE Trans. Consumer Electron., vol. 42, no. 3, pp. 439-446, August 1996, present the simultaneous decoding of short codewords in parallel at the same cycle (concurrent decoding algorithm). The decoder utilizes the probability distribution of variable length coding, according to which the bit sequence of the incoming bit stream will, at a higher probability, include more short codewords than long ones. For example, a 16-bit sequence of incoming data may, in the worst case, contain eight 2-bit codewords whose decoding, one symbol at a time, would require eight decoding cycles. The aim of the decoding algorithm disclosed in the publication is to accelerate the decoding process by detecting two or more shorter codewords simultaneously; in other words, the method is used to detect combinations of codewords. For example, in the decoding of a codeword with two levels, the first level of the tree model corresponds to the first codeword to be decoded, and the second level corresponds to the second codeword to be decoded. Thus, the aim is to decode two codewords simultaneously. The length of the first codeword may vary from 2 to k bits. If the length of the first codeword is 2 bits, the length of the second codeword, to be possibly decoded in parallel, is 2 to k2 bits. On the other hand, if the length of the first codeword is 3 bits, the length of the second codeword to be possibly decoded simultaneously is 2 to k3 bits, and so on. As the first level comprises all the codes, k is thus equal to the maximum length of the code book, corresponding to the length of the longest codeword. On the second level, shorter (more probable) codes are preferred according to the basic property of the VLC. Because of this, the value of ki is in the range 0≦ki≦k.
The basic procedure in a two-level decoding process starts from the root of the tree model. The bit sequences of each branch in the tree are tested substantially simultaneously against all the possible codewords on each level. If more than one suitable branch is found on the second level of the tree, that path will be selected which has been continuous from the root on, that is, the bit stream is continuous without breaks. The decoding process may end up on the first level, if none of the branches of the next level gives a continuous path from the root. This means that the second codeword cannot be decoded. The codeword length obtained from each level is used to control the next data of the input stream, wherein the decoding process will start again. The same decoding method can also be used with more levels. A codeword detection block based on the maximum likely bit pattern (MLBP) returns three factors to complete the decoding cycle: group flags to determine the residue in the group, the length to transfer the bit stream to the beginning of the next codeword, and a group code for the memory. The symbol is retrieved from the random access memory according to the residue and the group.