1. Field of the Invention
The present invention relates generally to data compression and decompression, file archiving, and streaming compression. Specifically, the present invention relates to a system and method for the detection and subsequent compression and decompression of files and data that include embedded data and files that have previously been compressed, and in particular, to the detection and compression of previously compressed data that resides in data or files of an unknown data or file format.
2. Discussion of Related Art Including Information Disclosed Under 37 CFR § 1.97, 1.98
The use of data compression or encoding schemes to reduce the size of files for backup or file transmission is well known in the art. Many types of data and files are compressed, including images in the well known GIF, TIFF, PNG, and JPEG formats (though including many others), video in the MPEG format, sound in MP3 and other formats, as well as standard archive formats, such as SIT, ZIP, GZIP, and so forth. Furthermore, many types of files have compressed images embedded inside, e.g., PDF files, WORD documents, and the like. Further, many of these compressed data types are included inside proprietary or unknown file formats.
Files and data streams that have been compressed, or include compressed data using sub-optimal techniques now comprise a major part of existing data, both for storage and transmission. (As used herein “file” and “data stream” are used interchangeably to denominate an identified set of data elements which may exist in several forms, including in a discrete form, or combined or stored with other files, or embedded in another file as a file or as an object, or as a file containing other files, or as a data stream of determinate or indeterminate size, including information transmitted over a data channel.) Compressed data is frequently large, and despite considerable advances in mass-storage density, computer processor speeds, and telecommunication system performance, compression techniques do not yet satisfactorily solve the space and transmission bandwidth problems. Developers of compression technology are now hard pressed to keep pace with the rapid growth of multimedia web-based applications which utilize enormous amounts of data. It would be advantageous, therefore, to compress files that include already compressed files even further. Moreover, it is desirable that such further compression be lossless—pixels, sound data, and the like remain the same—with no loss in quality over the original.
It is generally considered “impossible” to meaningfully compress already compressed data. More accurately, perhaps, it should be said that it is considered impractical to compress already compressed data, though it is true that most attempts at compression of already compressed files using currently known methods fail altogether and actually result in an increase in file size. Attempts have been made to compress JPEG, GIF, PNG files, and the like, but current compression algorithms when applied to these types of files generally achieve only a 1-2% improvement.
Further, JPEG, GIF, PNG and other types of images and other types of compressed data are commonly included/incorporated as portions of many other types of files and data. Many of these files and data streams that incorporate compressed data types are proprietary or otherwise unknown.
Accordingly, the present invention provides a data compression system and method that detects and categorizes embedded data types within unknown or partially known data streams. Each of these then categorized data streams (such as embedded GIF, TXT, HTML, PNG, JPEG, among others) can be processed separately with an appropriate compression method—such as those described in U.S. patent application Ser. No. 11/029,437, which relates to the compression of image data and which is incorporated in its entirety by reference herein, and also in U.S. patent application Ser. No. 11/029,438, which relates to further compression of already compressed data, and which is also incorporated in its entirety by reference herein. It must be emphasized that while PNG, GIF and others are themselves is a lossless, and JPEG is a lossy compression technology, the present inventive method of detecting and processing these files is lossless in that the original pixel or other underlying information is not compromised or reduced in any way.
In the case of compressed data which was originally lossy (where data loss was created in the original data compression, such as in JPEG/MP3, etc), which are embedded inside other data/file formats, no data loss is created in addition to the loss originally incurred/created by JPEG/MP3, or other compression protocol, when employing the present invention.
Further, the present invention provides a means to compress a wide range of unknown/unspecified data types/files in which already compressed files reside. This works by detecting and breaking a file down into its core data types, identifying and organizing the core data types, selecting an optimal compressor for the particular types, and then compressing the types in separate data streams.
The foregoing discussion reflects the current state of the art of which the present inventors are aware. Reference to, and discussion of, this art is intended to aid in discharging Applicants' acknowledged duties of candor in disclosing information that may be relevant to the examination of claims to the present invention.