The present invention relates to a method of encoding a data sequence by converting the data sequence into a grammar transform and then losslessly encoding the data sequence based on the grammar transform.
Universal data compression methods can be divided into two subdivisions: universal loss less data compression and universal lossy data compression. Conventional universal lossless data compression methods typically employ arithmetic coding algorithms, Lempel-Ziv algorithms, and their variants. Arithmetic coding algorithms and their variants are statistical model-based algorithms. To use an arithmetic coding algorithm to encode a data sequence, a statistical model is either built dynamically during the encoding process, or assumed to exist in advance. Several approaches have been proposed to dynamically build a statistical model. These include the prediction by partial match algorithm, dynamic Markov modeling, context gathering algorithm, and context-tree weighting method. In all of these methods, the next symbol in the data sequence is typically predicted by a proper context and coded by the corresponding estimated conditional probability. Good compression can be achieved if a good trade-off is maintained between the number of contexts and the conditional entropy of the next symbols given contexts during the encoding process. Arithmetic coding algorithms and their variants are universal only with respect to the class of Markov sources with Markov order less than some designed parameter value. Note that in arithmetic coding, the original data sequence is encoded letter by letter. In contrast, no statistical model is used in Lempel-Ziv algorithms and their variants. During the encoding process, the original data sequence is parsed into non-overlapping, variable-length phrases according to a string matching mechanism, and then encoded phrase by phrase. Each parsed phrase is either distinct or replicated with the number of repetitions less than or equal to the size of the source alphabet. Phrases are encoded in terms of their positions in a dictionary or database. Lempel-Ziv algorithms are universal with respect to a class of sources which is broader than the class of Markov sources of bounded order; the incremental parsing Lempel-Ziv algorithm is universal for the class of stationary, ergodic sources. Other conventional universal compression methods include the dynamic Huffman algorithm, the move-to-front coding scheme, and some two stage compression algorithms with codebook transmission. These conventional methods are either inferior to arithmetic coding and Lempel-Ziv algorithms, or too complicated to implement. Recently, J. C. Kieffer and E. H. Yang proposed a class of lossless data compression algorithms based on substitution tables in xe2x80x9cLossless Data Compression Algorithms Based on Substitution Tablesxe2x80x9d, Proc. of the 1998 Canadian Conference on Electrical and Computer Engineering(Waterloo, Ontario), Vol. 2, pp. 629-632, May 24-28, 1998. In this paper, a new coding framework is presented, but no explicit data compression algorithm is proposed. The greedy sequential transformation discussed in the paper is difficult to implement and does not facilitate subsequent efficient coding, since the symbol s0 is involved in the parsing process.
In accordance with the present invention, a universal lossless data compression method is provided to overcome the above-described disadvantages of existing compression methods. The compression method of the present invention utilizes a grammar transform to sequentially construct a sequence of irreducible context-free grammars from which an original data sequence can be recovered incrementally. Based on the grammar transform, the data sequence is encoded using sequential or hierarchical encoding methods.