Existing methods and systems for manipulating the size of a data file include encoding schemes for encoding information of the data file using fewer information bearing units. The information in the data file is considered to be generic data during encoding. The generic data is in the form of bits and bytes. A compressor used for encoding does not make a distinction between various characteristics of data while encoding. The distinction between various characteristics of the data can be more effective based on various inherent characteristics of the data.
Many of the existing compressors provide either a lossless compression or a lossy compression. The lossless compression can recover the original data completely through a decompression technique. The lossy compression results in some loss of data to achieve a higher compression. This results in non-recovery of the original data through any decompression technique.
Another existing method for encoding is a GDSII compression. The GDSII compression is based on generic off-the shelf compression techniques that are used in generic compression tools. Some of the generic compression tools that provide lossless compression, but are not limited to, are gzip and bzip2. Both the methods employ the repetition of patterns to compress an input data.
The gzip compression tool is based on a deflation algorithm. Duplicated strings are located in the input data. The duplicate strings in the input data are replaced by a pointer to a previous string. The pointer is in the form of a pair that may restrict the distance of the string to 32K bytes and may limit the length of the string to 258 bytes. The string is emitted as a sequence of literal bytes if the duplicate string does not appear in the restricted distance or the limited length. Also, additional methods for compressing the input data compress a generic sequence of bytes based on a dictionary approach.
The bzip2 tool is based on a Burrows-Wheeler Transform algorithm and Huffman coding. The bzip2 tool applies a reversible transformation to the input data. The reversible transformation allows an easier compression of the input data using a second algorithm. The second algorithm can include Huffman or Arithmetic coding. Blocks of input data are transformed using the algorithm instead of a sequence of the input data. A transformed block includes identical characters as present in the original block. Similar characters are grouped together in the transformed block. This provides for a simpler method to compress the input data using the second algorithm. Both the methods do not make use of known binary formats with specific grammar rules and structures for compressing the input data.
Therefore, there is a need for a method and system for compressing the input data using a repetition pattern along with specific grammar rules and structures.