With the wide adoption of communication networks, such as the Internet, efficient transmission and reception of large data files is increasingly important. To reduce the time and bandwidth for communicating such files, compression and decompression are often used. Compression utilities, such as WinZip™ or GZip™ are used to transform data files to Zip files, Zip being a popular data compression format. Compression also involves the use of a number of well-known compression algorithms, such as the Lempel-Ziv algorithm, which remove repeated data from a compressed file.
To achieve better compression results, compression tools often utilize knowledge of the structure of the to-be-compressed files and of the types of data in the to-be-compressed files. This knowledge is often provided in an associated file called a schema. Many to-be-compressed files, however, are not associated with any sort of schema. These sans-schema to-be-compressed files often are quite complex and include numerous types of data, such as tables, text, and images. Without access to a schema, compression tools attempting to compress such complex to-be-compressed files achieve less desirable compression results.