A technique of searching for compressed data of English-language text in a compressed format using a finite automaton has been disclosed (see, for example, Fukamachi, Shuichi; Shinohara, Takeshi; and Takeda, Masayuki, “Character String Pattern Comparison for Variable-Length Code Compressed Data: High-Speed Search Technique of Genom Information”, 1992, Information Science Symposium Proceedings for Presented Papers, Jan. 8, 1992, pp. 95-103). A technique of applying a tree structure to data encoding and a path control table has also been disclosed (see, for example, Japanese Laid-Open Patent Publication Nos. H10-271012 and 2000-188608).
However, application of the technique proposed by Fukamachi, et al to the Japanese language is not practically realized because the quantity of state transition tables to execute character recognition using the automaton increases. In addition, with 64 k types of character codes that are 16-bit codes compared to the 256 types of those that are eight-bit codes, a problem arises in that the time to generate the state transition tables and the size thereof synergistically increase. Therefore, any application of the technique to the Japanese language is difficult to practically implement.
According to Japanese Laid-Open Patent Publication Nos. H10-271012 and 2000-188608, when Huffman compression is executed with respect to 16-bit codes, compressed codes of characters whose appearance frequencies are low each need 20 bits or more. Therefore, a problem arises in that the quantity of tables for node-less linear searching balloons up to two to 20-fold.