The present invention concerns a method for data compression using a decompression language""s decompression instructions that are specified by using 3 bits to define which instruction and its syntax. There are 9 instructions, and each has a specific syntax, and may specify which character, number of bytes, position or distance, number of additional positions, repetition interval, etc. Two of the decompression instructions are specified by the same 3-bits and must be differentiated by a fourth bit. Decompression is accomplished by executing the decompression instructions until compression level zero is reached to reproduce the original data without any data loss. These methods may also be used for encryption.
Many data compression methods have been developed over the years to provide efficient data storage and transmission. The data are compressed by identifying redundant data, and providing means to replace them. One such popular method is described in U.S. Pat. No. 5,051,745 where redundant string segments are replaced by codes designating the location and length of matching string segments. U.S. Pat. No. 4,597,057 describes a system for compressing 8 bit ASCII coded English language text using coded strings of 4 bit nibbles. Matching string segments are replaced by 4 bit nibbles that specify the type and location of specific characters in tables.
U.S. Pat. No. 4,903,018 describes a method of compressing data by considering it as rows and columns, forming a two dimensional structure. The lines may have varying line lengths, and must first be sorted by index, in either increasing or decreasing line length. The lines are compressed where possible and the columns are compressed where possible. A line segment of identical characters produces a code that specifies the position, number, and character. A fill character, to maintain the structural integrity of the lines and columns replaces these characters. These fill characters are not considered during the column by column compression. A column segment of identical characters also produces a code that specifies the position, number, and character. A fill character to
maintain structural integrity also replaces the characters. U.S. Pat. No. 5,502,439 describes a method for compressing data that is based on LZSS compression methods. It creates a flag bit buffer for the temporary storage of flag bits produced during normal LZSS compression. A xe2x80x9c1xe2x80x9d bit signifies the next byte is original data, a xe2x80x9c0xe2x80x9d signifies that the next data are codes for the number of characters, and starting position to obtain the matching original data. The flag bit buffer is latter appended to the end of the compressed output. U.S. Pat. No. 5,729,223 describes a method of compression that first counts the number of occurrences for each possible character, 00X to FFX, in hexadecimal notation or 0 to 255 in decimal notation, and uses the unused and least used character codes as escape characters to identify escape sequences. Two matching blocks of data that are offset from each other by a multiple of N are replaced by escape sequences that specify the size and offset multiple factor of the matching block of data. The number N is equal to 4 on a 32 bit computer, i.e., it signifies 4 bytes per 32 bit word. The definitions of the escape characters and escape sequences must accompany the compressed data for proper expansion.
U.S. Pat. No. 5,734,340 describes a method for compressing File Allocation Tables (FAT) and similar structures that have runs of consecutive numbers and runs of intervening codes. The method generates a plurality of variable length code sequences where each code sequence specifies a particular consecutive run. Each code sequence contains a header that specifies several properties and may also contain an intervening run length, consecutive number run length, and jump value pointer, each varying from 0 to 4 bytes.
U.S. Pat. No. 6,021,198 describes a method for transmitting a file where the file may be variably compressed, by varying the compression level, i.e., the degree of compression to improve throughput of each 32 k byte data block. These and other methods sequentially search the input data for redundant data that have been observed previously and sequentially output data or replacement codes to an output file, or are transmitted. This sequential progression through the data, from one position, specifying data or replacement codes, then to the next position is very restrictive in the amount of redundant data that can be identified.
The limitations and disadvantages associated with the sequential progression through the data, from position to next position and specifying the data or replacement codes are eliminated by this invention. Instead of sequentially progressing from position to position, and specifying data or replacement codes, each unique 8-bit character that is in the data is chosen one at a time, and all its positions are specified using decompression instructions. Each character is eight bits, so that there are 256 possible characters to investigate. Whether the data are in bits, ASCII, EBCDIC or any other code is irrelevant. The decompression instructions may themselves be compressed by using this method for further compression, producing a new set of decompression instructions at a higher level of compression. All of the 8-bit characters are initially identified and counted. The character with the greatest count is chosen as the background character to reproduce the data. A specific decompression instruction is produced that will accomplish this when decompressing the compressed data, and will produce a string whose length is equal to the input data, and is filled with the background character, the one with the greatest count. This background character""s count is then set to zero. Each unique character with a count greater than zero is then investigated individually and its positions are identified and decompression instruction codes are produced that will position the character at the proper positions when decompressing. Therefore instead of going to a position and specifying a character or group of characters, this invention specifies a character and identifies all of its positions using decompression instructions. This invention provides a set of decompression instructions and methods of producing them. This invention also describes a character searcher for finding multiple instances of a particular character within a string and producing an optimum decompression instruction. This invention does not require the data to be sorted, or for the data to be considered as rows and columns as in U.S. Pat. No. 4,903,018. This invention works linearly with the data. It does not require escape sequence definitions to be included with the compressed data as in U.S. Pat. No. 5,729,223. It does not append flag bits to the end of the compressed output as in U.S. Pat. No. 5,502,439. This invention does not require the segment of the data that is to be compressed, to be contiguous in rows or columns as in U.S. Pat. No. 4,903,018. This invention looks for a character that is any linearly equidistant number of characters apart, starting with 1 and upward, then produces a particular decompression instruction that specifies the position of the first character, the number of additional identical characters, and the interval between each identical character. For example, it may find 5 xe2x80x9cBxe2x80x9d characters that are each 7 characters apart, and produce a decompression instruction to position these 5 xe2x80x9cBxe2x80x9d characters. This invention works with a particular character until all of its positions have been accounted for and decompression instructions have been produced. Then progresses to the next character that is contained within the data.
Decompression is a simple matter of following the decompression instructions to reach compression level zero, and reproduce the original data without any loss. This invention is most applicable to large sets of data, where a character""s multiple positions may be advantageously identified and optimum decompression instructions produced.