In a conventional technique, a phrase tree is generated in order to compress an input character string. Each node in a phrase tree includes a code to be replaced with a character or character string, characters to be compressed, data representing the hierarchy level, a pointer to a parent node, pointers that point to each of child nodes, which corresponds to each character that may be used, and a counter that counts the number of times that a character corresponding to each of the child nodes appears. For example, when an input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBB” is inputted, a phrase tree such as illustrated in FIG. 1 is generated. The threshold value for the number of appearances in order to generate a new node is “2”. In this example, a root node is provided as the 0-th level node, and nodes for characters “0x00” to “0xFF” are provided as 1st level nodes. As 2nd level nodes, nodes for characters “0x42” and “0x41” are provided as child nodes of character “0x41”, a node for character “0x43” is provided as a child node of character “0x42”, a node for character “0x42” is provided as a child node of character “0x43”. Furthermore, as 3rd level nodes, a node for character “0x43” is provided as a child node of character “0x42”, and nodes for characters “0x43”, “0x41” and “0x42” are provided as child nodes of character “0x42”. For each node, a code (A), a character (B), the numbers of appearances (C) for characters of child nodes, and pointers (D) that point to child nodes are illustrated schematically.
As illustrated in FIG. 2, because the number of child node pointers and the number of counters for the numbers of appearances of characters for child nodes are equal to 256, which is equal to the number of characters that may be used, 3,085 bytes are used as a capacity of the memory used for one node. Presuming that 65,536 nodes, which are the maximum number of nodes that can be expressed with a 2 bytes of the code length are provided, about 192 Mbytes are totally used as a capacity of memory.
The phrase tree can compress various data, when the number of types of codes (in other words, the number of nodes) used for replacing the character strings becomes greater. However, when the types of codes increase in this way, the number of nodes also increases, and thus the overall size of the phrase tree becomes large.