1. Field of the Invention
The present invention relates to a method for compressing data, and more particularly, to a method for constructing and searching an improved Huffman table which provides improved efficiency over an existing Huffman table in a method and apparatus for decoding data using a Huffman code used widely in processing data such as audio, video and the like.
2. Description of the Related Art
Recently, with the rapid development of computers, a system for compressing data by treating a mass of data and efficiently using a memory unit has been popular to thereby reduce data transfer time.
There have been known a variety of encoding methods used when data are compressed. Among the methods, an encoding method applicable to a variety of data without limiting the target data to be encoded to character cords, vector information, images, etc., is called universal encoding. Universal encoding is classified into directory type encoding using similarity of character strings and statistical probability type encoding using the frequency of occurrence of a character.
A representative statistical probability type coding is a Huffman encoding. The Huffman encoding uses codes having a length inversely proportional to the frequency of occurrence of a character (Huffman code). First, a Huffman tree, which is a data structure used in the generation of the Huffman code, will be described with reference to FIG. 1. In FIG. 1, portions indicated by circles and squares are referred to as nodes. A node existing in the highest level is referred to as a ‘root node’ and a segment of a line connecting the nodes is referred to as a ‘branch’. Further, a low level node Y connected to an existing node X by the branch is referred to as a ‘child node’ of node X whereas the node X is referred to as a ‘parent node’ of node Y. A node with no child node is referred to as a ‘terminal node’ corresponding to a character. Moreover, each of the nodes other than a terminal node is referred to as an ‘internal node’ and a ‘level’ is indicated by the number of branches from the root node to each of the nodes other than the root node.
In a Huffman tree, a path from the root node to a terminal node corresponding to a character to be encoded is output as a code. In other words, if a path is branched to the left from the root node to a target terminal node, “1” is output, or if it is branched to the right, “0” is output. For example, in the Huffman tree as shown in FIG. 1, a code “00” is output for a character A corresponding to a terminal node of node number 7 and a code “011” is output for a character B corresponding to a terminal node of node number 8.
In Huffman decoding, which is a reverse process of Huffman encoding, a character corresponding to a terminal node arriving from the root node according to a value of each bit of data to be decoded is output.
The Huffman encoding generates a Huffman tree in the following sequence (called a Huffman algorithm). First, the frequency of occurrence of a character corresponding to each terminal node is recorded. Second, for two nodes having the least frequency of occurrence, one new node is created and the created node and the two nodes are connected by branches, respectively. Further, the sum of occurrence frequencies of the two nodes connected to each other by a branch is recorded in the newly created node. Third, the second procedure is repeated until all nodes at all levels are combined into one tree.
In the Huffman tree generated in this sequence, a code having a length inversely proportional to the frequency of occurrence of each character is allocated for the character. For this reason, when characters are encoded using the Huffman tree, and the encoded characters are constructed as a table (referred to as a Huffman table) and then are decoded using the table in the Huffman decoding process, data can be compressed effectively.
This Huffman table may be known in advance in a Huffman encoder and a Huffman decoder, or may be transferred from the Huffman encoder to the Huffman decoder using header information when data are transferred. In the former method, several Huffman tables are specified after the frequency of occurrence of each character is statistically obtained, and the Huffman tables are stored in advance in the Huffman encoder and the Huffman decoder. Thereafter, when one of the specified Huffman tables is designated in transferring actual data, the data are decoded using the same table in the Huffman decoder. In the latter method, the frequency of occurrence of each character when data to be transferred are encoded in the Huffman encoder is obtained to create the Huffman table and then the data along with the created Huffman table are transferred to the Huffman decoder. Then, in the Huffman decoder, the data can be decoded using the transferred Huffman table.
FIG. 2 shows an example of a conventional Huffman table. The conventional Huffman table is composed of a ‘CodeLength’ representing a length of each Huffman code, a ‘CodeWord’ representing a decimal value of the Huffman code, and a ‘QuantValue’ representing a quantization value for identifying each Huffman code. Assuming that the Huffman code is created according to the frequency of occurrence of a character as shown in FIG. 1, a result can be obtained as shown in a table of FIG. 3. Here, each QuantValue indicating a value actually signified by data can represent a character such as {e, t, a, c, r, s, . . . } directly or a color such as {red, blue, yellow, . . . } depending on the kind of data (text data, image data and the like). When the table of FIG. 3 is organized as a Huffman table, the result sown in FIG. 2 can be obtained. One Huffman code can be divided into CodeLength and CodeWord. For example, ‘011101’ can be represented by CodeLength of ‘6’ and CodeWord of ‘29’ in decimal number.
Now, a conventional Huffman decoding method using the above Huffman table will be described with reference to FIG. 4. First, Huffman encoded data are received and stored in a memory of a Huffman decoder (S400). Then, one bit is extracted from the memory (S410). The extracted bit is added to the end of the previous bit string (S420). Next, the CodeLength of the current code is read from the Huffman table stored in the memory (S430). The Huffman table may be stored in the memory after being received as part of the received data, or may be stored in advance in the memory of the Huffman decoder separately from the received data. Next, the number of bits of the current bit string is compared with the read CodeLength (S440). If they are not equal to each other, one bit is again extracted from the memory (S410). If they are equal to each other, the CodeWord corresponding to the CodeLength, i.e., the CodeWord of the current code, is read from the Huffman table (S450). Next, the current bit string is compared with a value of the CodeWord (S460). If they are not equal to each other, the CodeLength of the next code after the current code is read from the Huffman table stored in the memory (S470), and then the process returns to step S440 to compare the number of bits of current bit string with the read CodeWord.
As a result of comparison in step S460, if they are equal to each other, the QuantValue of the current code is returned (S480). Then, it is determined whether all bits of the data stored in the memory arc extracted (S490). When all bits of the data are extracted, the process is ended. If any bit remains, previous bit strings are initialized to ‘NULL’ and then the steps before step S410 are performed.
Now, in order to more fully understand the process of FIG. 4, an example where actual values are applied to data will be described. For description, it is assumed that data received from the Huffman encoder and stored in the memory of the Huffman decoder is ‘000110111’. First, one bit (0) of the data is extracted. Since this bit is the first extracted bit, and therefore, there exists no previous bit string, the value of the bit is not changed even when step S420 is performed. CodeLength of ‘1’ of a first code is read from the Huffman table stored in the memory (S430). Next, since the bit number ‘1” of the current bit string is identical to the CodeLength ‘1’ (YES in step S440), a value ‘0’ of the current string is compared with CodeWord ‘1’ of the current code in the Huffman table (S460). Since the value ‘0’ is not identical to the CodeWord ‘1’, CodeWord ‘2’ of the next code is read (S470) and then the bit number ‘1’ of the current bit string is compared with CodeWord ‘2’ (S440). Since the bit number ‘1’ is not identical to CodeWord ‘2’ as well, another bit ‘0’ is again extracted from the memory (S410) and then is added to the end of the previous bit string ‘0’ (S420). Then, the current bit string becomes ‘00’. CodeWord ‘2’ of the current code is read from the Huffman table stored in the memory (S430). Next, since the bit number ‘2’ of the current bit string is identical to CodeWord ‘2’ (YES in step S440), a value of the current bit string (‘0’ in decimal number) is compared with CodeWord ‘0’ of the current code in the Huffman table (S460). As a result of comparison, since the value ‘0’ is identical to CodeWord ‘0’, QuantValue ‘2’ of the current code is returned (S480).
Now, bits remaining in the memory are ‘0110111’. Since all bits are not yet extracted from the memory, previous bit strings are initialized (S490) and another bit is extracted from the memory (S410). Thereafter, while bits are added to the bit string one by one, such as ‘0’, ‘01’, ‘011’, ‘0110’ ‘01101’ and 011011,’ the bit is compared with the next code of the Huffman table. However, since values of the bit string are not identical to CodeWords (NO in step S460), the steps from S410 to S460 are repeatedly performed. Even when the bit string became ‘0110111’, if CodeWords of a code with CodeLength of 7 are 57 and 56, since values of the bit string are not identical to the CodeWords (NO in step S460), steps S470, S440 and S450 are performed in order. Only after these steps are performed, a value ‘55’ of the bit string of ‘0110111’ becomes identical to CodeWord ‘55’ (YES in step S460) and then QuantValue ‘9’ of the current code is returned (S480). Then, since all bits stored in the memory has been extracted (YES in step 490), all processes are ended.
An example where the system for decoding the conventional Huffman encoded data is constructed by pseudo codes is as shown in the following program 1.
-----------------------------[Start of program 1]------------------------------------do(/*STEP1: obtain a value of Word and WordLength */. . .a = extract(1bit);//extract bit from memory one by oneWord = Word << 1;// add the extracted bits to a previous valueWord + = a;WordLength++;//the total numher of bits read from memory. . ./*STEP2: read CodeWord and CodeLength from Huffman table and com-pare them with the value obtained above */while (CodeLength (obtained from Huffman table) ==WordLength (the number of bits of bit string read from input buffer){if (Word == CodeWord[nIndex])//find identical CodeWord from table{QuantizationValue = pHuffmanQ—>QuantValue[nIndex];Return 1;//return corresponding QuantValue if all values ofCodeLength and CodeWord are identical}elseIndex++;//increase Index if only one of two values is not identi-cal CodeLength=pHuffmanQ−>WordLength[nIndex];//repeat until next value is read and found}} while (HuffmanQ_Lookup(nQValue) == NULL);-----------------------------[End of program 1]-------------------------------------
The conventional Huffman algorithm is inherently a method for reducing the amount of data by quantizing and tabling a signal in an encoding process, and finding an actual value with reference to the same table when the signal is decoded in a decoder. Accordingly, as the number of times that the table is referred to or the value is found is increased, system resources are additionally consumed. In other words, this algorithm is inefficient in that unnecessary comparison (S440) is frequent until the value of a bit string composed of extracted bits and the number of bits become identical to the CodeWord and CodeLength of the Huffman table since the length of a next code is not known in the Huffman decoding process, and an ExtractBit function (a function for extracting bits from the memory) which consumes significant system resources is frequently called. Particularly, if the same CodeLength is frequently repeated in the Huffman table, inefficiency of the system continues to increase.