1. Field of the Invention
The invention relates to the field of compression and decompression of text.
2. Description of the Prior Art
Compression algorithms, such as Huffman, LZ78, LZW and hundreds of other variants of the above techniques usually exploit statistical redundancy of the English letters and give limited compression rate, which was formulated by Claude E. Shannon. According to the Claude E. Shannon's theory of data compression there is a fundamental limit (entropy rate) to lossless data compression. Shannon has been able to estimate for the general model (by using a prediction method) that the entropy rate of the English text can in theory reach 2.3 bits/character. None of the proposed above compression algorithms can provide results as theorized in the Cannon's third-order statistical distribution of English text entropy rate, with the obtainable limit reaching 2.77 bits/character. There were a lot of techniques proposed to increase the compression rate (absolute number) or compression ratio (relative number), e.g. a word-based Huffman coding, where, the table of symbols in the compression coder becomes the text vocabulary; or an Efficient Optimal Recompression; or a Semi-lossless Text Compression; or a Programmed selection of common characters and pairs; or a Programmed selection of prefixes and suffixes; or the method of compression text proposed by U. Khurana “Text compression and Superfast Searching;” [5] based on sequentially converting words of source text into 16 high bit length indexes. For the method of compression text proposed by U. Khurama it is impossible to increase compression ratio because of limitation up to 65,536 words contained in the permanent reference vocabulary by means limitation to build vocabulary symbols as phrases, punctuations, and words and mark combinations. Compression techniques involve trade-offs between various factors, such as the complexity of the designs of data compression/decompression schemes, the ability to search a compressed text in the system without decompressing it, the speed of an operation system, the consumption of expensive resources (i.e. storages and transmission bandwidth), the compression capability, the time it takes to compress information, the user's computer power, the cost of text compression due to the text coding and decoding as well as other factors. None of the methods satisfy the requirement of efficient compression and decompression of text. Furthermore, the methods have both advantages and disadvantages of implementation of different kinds of applications e.g. the requirement of reducing time of text decompression and reducing the working frequency of a microprocessor of an electronic rider. The present invention tries to resolve some restrictions of the systems and apparatuses, which are involved in the process of coding/decoding, storing, and transmitting of text. Furthermore, the present method of converting any symbols into indexes permits to increase the compression ratio of the stored text, to increase the compression rate of the transmitted text, and to reduce the cost of the receivers.
In parent U.S. Pat. No. 8,332,209 ('209) of the “Method and system for text compression and decompression” discloses of compression text by creating the “permanent reference vocabulary” wherein the “permanent vocabulary is a redundant vocabulary including words, word combinations, and word and punctuation combinations”; splitting the permanent vocabulary into various functional sections, such as section 1—with most common usable words, section 2—nouns, section 3—verbs, section 4—adjectives . . . ”; and creating the temporary vocabulary, wherein “the functionality of the temporary vocabulary is to convert high bit length indexes belonging to the permanent vocabulary into low bit length indexes presented in the temporary vocabulary, which are then used to create pseudo-code.”; “splitting a temporary vocabulary into two sections, which include a root of tree section and a main section.” Also discloses techniques for implementation of text compression and decompression such as “creating pseudo-codes; arranging the pseudo-codes for storage and transmission.”
Below demonstrates features and advantages for the method of text compression and decompression.