1. Field of the Invention
The invention relates to the field of compression and decompression of text.
2. Description of the Prior Art
Compression algorithms, such as Huffman, LZ78, LZW and hundreds of other variants of the above techniques usually exploit statistical redundancy of the English letters and give limited compression rate, which was formulated by Claude E. Shannon. According to the Claude E. Shannon's theory of data compression there is a fundamental limit (entropy rate) to lossless data compression. Shannon has been able to estimate for the general model (by using a prediction method) that the entropy rate of the English text can in theory reach 2.3 bits/character. None of the proposed above compression algorithms can provide results as theorized in the Cannon's third-order statistical distribution of English text entropy rate, with the obtainable limit reaching 2.77 bits/character. There were a lot of techniques proposed to increase the compression rate (absolute number) or compression ratio (relative number), e.g. a word-based Huffman coding, where, the table of symbols in the
compression coder becomes the text vocabulary; or an Efficient Optimal Recompression; or a Semi-lossless Text Compression; or a Programmed selection of common characters and pairs; or a Programmed selection of prefixes and suffixes; or the method of compression text proposed by U. Khurana “Text compression and Superfast Searching;” [[5]] based on sequentially converting words of source text into 16 high bit length indexes. For the method of compression text proposed by U. Khurama it is impossible to increase compression ratio because of limitation up to 65,536 words contained in the permanent reference vocabulary by means limitation to build vocabulary symbols as phrases, punctuations, and words and mark combinations
Compression techniques involve trade-offs between various factors, such as the complexity of the designs of data compression/decompression schemes, the ability to search a compressed text in the system without decompressing it, the speed of an operation system, the consumption of expensive resources (i.e. storages and transmission bandwidth), the compression capability, the time it takes to compress information, the user's computer power, the cost of text compression due to the text coding and decoding as well as other factors. None of the methods satisfy the requirement of efficient compression and decompression of text. Furthermore, the methods have both advantages and disadvantages of implementation of different kinds of applications e.g. the requirement of reducing time of text decompression and reducing the working frequency of a microprocessor of an electronic rider.
The present invention tries to resolve some restrictions of the systems and apparatuses, which are involved in the process of coding/decoding, storing, and transmitting of text. Furthermore, the present method of converting any symbols into indexes permits to increase the compression ratio of the stored text, to increase the compression rate of the transmitted text, and to reduce the cost of the receivers.
In the present invention “symbol” means letter, word, phrase, number, sentence, punctuation mark, prefix, suffix, and permanently or temporarily made words combinations; “Index” means an address of the symbol located in the permanent and temporary vocabularies.