Data structures used by computer programs often include various record or object types. Each of these record types includes a fixed number of fixed-sized fields. Each field may include another record type or a built-in type such as an integer or pointer to another object. Often times, there may be multiple instances of each record type where each instance has a same number of fields but each including different data between the various instances. Computer programs are often called upon to write these data structures onto a persistent storage device.
One approach used in the past to write data structures onto a persistent storage device was to write the image of each record directly to disk either as binary image or encoded in a human-readable character set. This required that every byte of a record correspond to a byte on the disk or multiple characters on a disk. This resulted in very large data structures. In order to reduce the size of the data structure, dictionary-based algorithms were used to compress the data in the data structures. The dictionary-based algorithms compressed data by finding strings of bytes that are repeated in a file, and replacing subsequent instances of the same string with a shorter identifier that refers back to a previous instance of the string. These same algorithms were occasionally augmented with the ability to repeat multiple occurrences in a row of either a single character or a multi-character sting by using a repeat-count. The longer the repeated strings encountered, the larger the compression ratio would be.
The effectiveness of dictionary-based compression algorithms were hindered, however, by the short, non-repeated strings that naturally occured in object-oriented databases. Thus, what is needed is an effective method for compressing data stored in object-oriented database as well as other data having short, non-repeating strings with patterns.