The use of data compression or encoding schemes to reduce the size of files for backup or file transmission is well known in the art. Many types of data and files are compressed, including images in the well known GIF and JPEG formats (though including many others), video in the MPEG format, sound in MP3 and other formats, as well as standard archive formats, such as SIT, ZIP, GZIP, and so forth. Furthermore, many types of files have compressed images embedded inside, e.g., PDF files, WORD documents, and the like.
Files and data streams that have been compressed using sub-optimal techniques now comprises a major part of existing data, both for storage and transmission. (As used herein “file” and “data stream” are used interchangeably to denominate an identified set of data elements which may exist in several forms, including in a discrete form, or combined or stored with other files, or embedded in another file as a file or as an object, or as a file containing other files, or as a data stream of determinate or indeterminate size, including information transmitted over a data channel.) Compressed files are frequently large, and despite the considerable advances made in mass-storage density, computer processor speeds, and telecommunication system performance, compression techniques do not yet satisfactorily solve the space and transmission bandwidth problems. Developers of compression technology are now hard pressed to keep pace with the rapid growth of multimedia web-based applications which utilize enormous amounts of data. It would be advantageous, therefore, to compress already compressed files even further. Moreover, it is desirable that such further compression be lossless.
It is generally considered “impossible” to meaningfully compress already compressed data. More accurately, perhaps, it should be said that it is considered impractical to compress already compressed data, though it is true that most attempts at compression of already compressed files fails altogether and actually results in an increase in file size. Attempts have been made to compress JPEG files, but current compression algorithms when applied to JPEG files generally achieve only a 1-2% improvement.
JPEGs provide a useful example to consider as candidates for further compression, firstly because the JPEG standard is universally accepted, and secondly because of the sheer size of typical JPEG files, and thirdly because existing generic compression algorithms cannot compress JPEG files as JPEG data streams are essentially random series of bytes. Moreover, JPEG files enjoy increasing popularity due to the advent and worldwide adoption of digital cameras and camera phones. Presently, JPEG shows every sign of continued, essentially unfettered growth.
JPEG is a lossy compression technique originally developed for continuous tone still images. The great majority of digitized still images are now stored in the JPEG format after being compressed by the JPEG compression technology. JPEG technology exploits the limitations of human visual sensitivity to chrominance, and discards a significant amount of chrominance information without compromising the quality of the image. Although the JPEG standard includes numerous compression options, one involves the elimination of three fourths (¾) of the chrominance information before applying several other compression techniques. This is a very simple kind of irrelevancy reduction. It alone reduces the size of the file to be compressed by half, and it is scarcely noticed, if at all, by the human visual system; that is, the degradation due to the loss of information is acceptable to, and not perceived by, most viewers. Another lossy compression method under the JPEG standard entails, in order, color space transformation, downsampling, discrete cosine transform, quantization, and entropy coding. The numerous JPEG options utilize a number of different redundancy reduction compression techniques having different rates of compression, with each successive rate producing a smaller, but increasingly degraded file. However, even after JPEG compression, file volume can still tax data transmission systems and computer processors.
Accordingly, the present invention provides a data compression system and method that losslessly compresses and decompresses JPEG files, among other file types. It must be emphasized that while JPEG itself is a lossy compression technology, the present inventive method of compressing JPEG files is lossless. No loss in addition to the loss created by JPEG is created when employing the present invention.
Further, the present invention provides means to compress a wide range of already compressed files by breaking a file down into its core data types, identifying and organizing the core data types, selecting an optimal compressor for the particular types, and then compressing the types in separate data streams.