1. Field of the Invention
The present invention relates to variable-length encoding and decoding of data. In particular, the present invention relates to parallelization of such a decoding.
2. Description of the Related Art
Variable length coding is coding that assigns code-words with different lengths to different symbols of an alphabet. A counterpart of variable length coding is a fixed length coding. Fixed length coding assigns a code-word with the same length to each symbol. For instance, if communication is performed using the alphabet consisting of four symbols A1, A2, A3, and A4, these symbols may be encoded each with two bits using a code C such that C(A1)=“00”, C(A2)=“01”, C(A3)=“10”, and C(A4)=“11”. Let us now assume that input data sequence to be encoded is                input-data={A1, A2, A4, A1, A3, A1, A1}.        
Then the data after encoding by the fixed length code C will be                C(input-data)=“00 01 11 00 10 00 00”.        
(In this example, the spaces between the code-words of particular symbols are for better intelligibility only and do not belong to the encoded sequence.) The length of encoded data is given by the number of coded symbols which, in the above case, is 7, multiplied by length of the symbols, which is in our case equal to two for all code-words, resulting in the length of the resulting coded stream being 14.
The variable length coding is also called entropy coding if the code-word length of code-words assigned to particular symbols of an alphabet is determined based on occurrence probabilities of those symbols. In particular, the most probable symbols are encoded with code-words having the shortest possible length. According to the Shannon's theorem, the optimal code length Ws in bits for a symbol s with probability Ps is given byWs=−log2Ps.
Thus, entropy encoders typically try to assign to each symbol of an alphabet a code-word with a length proportional to the negative logarithm of the symbol probability. In order to design such an entropy code, the probability distribution of the source has to be known or assumed. The distribution of a source generating symbols from an alphabet is given by probability of occurrence of each symbol of the alphabet (also known as an a priori probability). If the probability of occurrence of all symbols is equal, the entropy coding is equivalent to fixed coding, which means that employing a variable length code does not, in general, lead to improvement of the coding gain. Thus, entropy coding only makes sense for non-uniformly distributed sources.
The advantage of entropy coding is that for non-uniformly distributed sources it may reduce the amount of data necessary for coding an input data stream without degrading its quality. This means that entropy coding is inversible and the encoded data may be completely restored by the inverse entropy decoding in case no error has occurred in the coded bit stream during its transmission or storing. For non-equally distributed sources, entropy coding may provide a considerable coding gain. Therefore, entropy coding has been widely used in a very broad spectrum of applications, such as text compression, for instance, zip or gzip; image compression, for instance, jpeg, png, svg, tiff; or video compression, such as MPEG2, MJPEG, H.264/MPEG-4 AVC, etc.
One of the very popular entropy coding methods is called Huffman coding. Huffman coding is designed for a given alphabet of symbols based on their occurrence probabilities as illustrated in FIG. 5. In this example, an alphabet of four symbols A1, A2, A3, and A4 is used for transmitting the data similarly to the above example regarding fixed length code. The a priori probability of occurrence of these symbols is P(A1)=0.4, P(A2)=0.35, P(A3)=0.2, and P(A4)=0.05. Construction of Huffman code for a given symbol alphabet is performed by constructing a binary tree as follows. Each symbol s of the alphabet 510 is assigned a probability of occurrence P(s) 520. As long as there is a plurality of nodes which are not part of the binary tree, in each step of the Huffman code construction, two nodes with minimum probability are joined into a common node, while the new common node is assigned a probability equal to a sum of probabilities of the joint nodes. This is illustrated in FIG. 5. In the first step, nodes corresponding to symbols A3 and A4 are joined and a new node with probability 0.25 is created. In the second step, the new node is joined with the node corresponding to symbol A2 and assigned the probability of 0.6. In the last step, this node is joined with the node corresponding to symbol A1, resulting in a common root node of the binary tree. The code-words of the so-constructed Huffman code are then determined by marking each edge of the binary tree by a binary value 530. For instance, the upper edge is assigned a binary value of 0 and the lower edge is assigned a binary value of 1. By reading the binary values on the edges from the root towards each symbol S, a coding table 500 is specified.
The coding table 500 is then used for encoding the input data. Let us assume a sequence of input symbols similar to the example presented above for a fixed length coding:                input-data={A1, A2, A4, A1, A3, A1, A1}.        
Using the code-word table 500, representing Huffman code H, the coded bit stream is given by                H(input-data)=“01 01 11 011 000”.        
(In this example, the spaces between the code-words of particular symbols are for better intelligibility only and do not belong to the encoded sequence.) The length of this encoded data is 12 bits, which is less than 14 bits needed for encoding the same symbol sequence with a fixed length code. The construction of Huffman code using the binary tree ensures that the resulting code is a so-called prefix code which can be uniquely decoded.
In general, variable length coding has to be decoded serially in order to determine the respective code-words and boundaries between them. For instance, in the data stream H(input-data) encoded by the Huffman code, the respective code-words are identified during the decoding. The first code-word is “0”. According to the coding table 500, there is only a single code-word starting with 0 which is the code-word corresponding to the symbol A1. Consequently, the first binary symbol 0 corresponds to the code-word “0” for the symbol A1. The next binary value of the encoded bitstream is 1. There are three code-words in the coding table 500 starting with binary symbol 1. Therefore, the next binary symbol 0 is the encoded bitstream also belongs to the same code-word. Since there is only one code-word in the table 500 which starts with binary sequence “01”, the second and the third binary symbols of encoded data are identified as a code-word for symbol A2. In a similar way, the rest of the encoded data stream is parsed into code-words “111”, “0”, “110”, “0”, and “0” corresponding to symbols A4, A1, A3, A1, and A1 of the input data sequence. As can be seen from this example, entropy decoding is inherently serial. The start of the next code-word cannot be identified before the previous code-words have been decoded. Consequently, the entropy decoding procedure such as Huffman decoding cannot be easily parallelized.
However, parallelization of decoding is an essential means to meet a trade-off between the power and the performance in the employed computing systems. Especially in embedded systems, in order to achieve low power solutions, multi-core architectures are often deployed. The main challenge in the programming of multi-core architectures is parallelization, which means, separating of the task to be performed into sub-tasks that may be performed in parallel and possibly independently of each other. In applications using entropy coding for compressing data, such as text compression tools, audio, image or video compression algorithms, the entropy coding represents a bottleneck to an effective parallelization. Nevertheless, in particular image and video compression and decompression including entropy coding is an application where parallelization is necessary in order to allow real time and low power encoding and decoding especially for portable devices operating on batteries or accumulators.
FIG. 6 illustrates an example of a JPEG baseline encoder 600. An input image is first subdivided into blocks of 8×8 pixels. Each block is transformed by a discrete cosine transform (DCT) 610 and each transformed block of transformation coefficients is further quantized 620 and serialized by means of a zig-zag scan. The DC component is encoded using differential pulse coded modulation (DPCM) 630 and the AC components are further encoded by a run length coding (RLC) 640. The so encoded DC and AC components are finally encoded by an entropy encoder 650 which encodes the input symbols into code-words according to a coding table 660. JPEG standard utilizes Huffman coding. The encoded bitstream 670 may be added an information about the coding table used, in order to allow the updating of the coding table and thus an adaptation of the entropy code to a possibly varying statistics of the source, for instance, for different images.
The performance gain that can be achieved by the use of a multi-core processor is strongly dependent on the software algorithms and their implementation. In particular, the possible gains are limited by the fraction of the software that can be parallelized to run on multiple cores simultaneously.
In order to parallelize JPEG decoding, for instance, the inverse discrete cosine transform (IDTC) and color conversion may be parallelized. However, such a parallelization only gains about 14% speed-up, approximately 12% for the IDCT and 2% for the color conversion. Since the entropy decoding cannot be easily parallelized, the parallelization of the remaining image decoding steps may only be used after the data necessary for such decoding have been serially entropy decoded.
Similar situation may also occur for other applications which employ variable length coding, such as compression of any date by a mechanism using entropy coding such as zip, gzip, rar etc., or other image, video or audio compression methods utilizing entropy codes. Moreover, the above described problem of identifying the starting points and the endpoint of the code-words is not specific to the Huffman code exemplified in FIG. 5. Other variable length codes, such as Shannon-Fano coding, adaptive Huffman coding, or universal codes such as Golomb, Elias, or unary codes, also need to be decoded serially.
From the prior art, several ways for synchronizing the decoding of an entropy encoded data stream are known. For instance, T. J. Ferguson and J. H. Rabinowitz: “Self-synchronizing Huffman codes,” IEEE Tans. Inform. Theory, Vol. 30, No. 4, 1984, pp. 687-693 analyzes synchronization by synchronizing code-words, after decoding of which the decoding of a Huffman code synchronizes.
In W. M. Lam and S. R. Kulkarni: “Extended synchronizing codewords for binary prefix codes,” IEEE Trans. Inform. Theory, Vol. 42, No. 3, 1996, pp. 984-987, the synchronizing code-words form part of the entropy coded data and may be used to carry coded information like other code-words and/or as extra symbols inserted between other encoded code-words.
The problem of parallel decoding is handled in S. T. Klein and Y. Wiseman, “Parallel Huffman decoding with applications to JPEG files,” The Computer Journal, British Computer Society, Vol. 46, No. 5, 2003, pp. 487-497. Accordingly, the self-synchronization at the synchronizing codewords is utilized to partially parallelize the decoding of a Huffmann code.
The efficiency of such parallel decoding depends on the self-synchronizing properties of the applied variable length code and on the content of the coded data. Thus, it does not enable optimizing the load balancing of the parallel decoder.