Field of the Invention
The present invention relates generally to data storage and communication, and particularly to systems for compressing data to facilitate denser data storage and faster data communication.
The present invention consists of a number of specialized data compression subsystems--each designed to compress a particular type of data--that are utilized to implement an adaptive, data-indifferent compression system that has achieved an average compression ratio for all files and data types in excess of 3:1. The compression system made possible through the present invention adapts to any type of data by combining to achieve optimum overall compression. Tested over a range of files including bitmap, database, spreadsheet, ASCII, EBCDIC, and text/word processor files, this data compression system has achieved compression ratios ranging from 2.7:1 to nearly 10:1. The data compression systems of the present invention can be used to compress data in any language, text or binary format, regardless of its machine language encoding.
It is thus an object of the present invention to facilitate denser data storage.
It is another object of the present invention to facilitate faster data communication.
It is another object of the present invention to provide a compression system capable of adapting to different data types.
The data compression system of the present invention is implemented in three main functional groups: the scanners, the decision engine, and the compression modules.
The scanners are implemented in two stages, called Copy Scan and Main Scan. The scanners handle input from a data file. The file can be either a data storage file or a data communication file. The file consists of a stream of bytes or characters, a byte typically containing eight (8) bits of data. Before scanning, the file is divided into records (in the preferred embodiment, a record is 2048 bytes long) by the user or application, and the scanner then processes the file one record at a time. The scanners examine the data and produce a data profile report.
The data profile report is a detailed breakdown of the different types of data characters and groups in the record. The report indicates the identity of each byte of data and classifies each byte into one of eight categories. The report gives the frequency with which specific characters and groups of characters occur in the record and the relative position of those characters and groups. The report also indicates the identity and location of characters, words and phrases that are repeated within the record. Once the report is compiled, it is passed on to the decision engine.
The decision engine processes the data profile report and classifies the entire record as fitting into one of eight data composition profiles, or modes. The decision acts according to a set of statistical rules. Based on the report's indication of the content of the data record, the decision engine constructs a program for optimum compression of the record. The program made by the decision engine consists of detailed instructions regarding application of the compression modules. Each compression module is a highly specialized compression subsystem designed to compress a specific type of data (E.g., ASCII capitals, Hexadecimal numerics, etc.). The decision engine's program specifies which compression modules are to be applied and the order in which they are to be applied, as well as the conditions that will mandate branching from one module to another for better compression.
The modules then compress the data, branching among themselves according to the instructions provided by the decision engine. Thus, execution of the compression modules is dynamically adaptive (following the "road map" provided by the decision engine) for optimum compression.
The invention is described in greater detail in the detailed description of the preferred embodiment, and the drawings and claims.