1. The Field of the Invention
The present invention relates generally to the field of data compression. More specifically, embodiments of the present invention relate to systems and methods for selectively providing data compression on a data stream.
2. The Relevant Technology
Information theory is a branch of mathematics that was largely developed in the late 1940's. In general, information theory pertains to the identification and measurement of statistics and characteristics of information. For example, information theory techniques are often used to optimize the efficiency of computer communications. One such area is data compression, where data can be represented with a decreased number of bits
Data compression refers generally to the process of transforming data into a smaller or “compressed” version of itself from which the original data, or close approximation thereof, can be reconstructed at a later time. This compressed data advantageously preserves valuable data storage space and reduces the amount of bandwidth needed on a communications link and thereby allowing faster data transmission rates. As is well known, in computer data communications, the ability to provide faster transmission rates is extremely important—especially when communicating in channels having bandwidth constraints.
Two primary data compression techniques exist. One technique is commonly referred to as “lossy” data compression and the other is referred to as “lossless” data compression. Lossy data compression is a compression technique that allows the reconstructed data to vary from the original data upon the condition that the “essence” of the original data is preserved. Although this technique concedes a certain amount of accuracy during reconstruction, lossy data compression typically allows for relatively large compression ratios. Often, a fidelity criterion is introduced into lossy compression so that some measure of consistency between the original data and the reconstructed data can be expected by its users.
Until recently, lossy compression has been predominantly implemented by dedicated hardware devices. Now many powerful lossy software programs have been introduced. Typical software algorithms using lossy techniques include JPEG (Joint Photographic Experts Group) and MPEG (Motion Pictures Expert Group). These and other algorithms have proved extremely successful for lossy compression of sound files, such as digitized voice, and graphic images. This is because sound and picture formats are frequently associated with other industries, such as music and video, that customarily introduce inaccuracies into recorded or reconstructed format versions.
On the other hand, lossless data compression consists of numerous techniques guaranteeing an exact duplication between the original and reconstructed data. There are a number of examples of lossless data compression techniques are lossless, of which, statistical and dictionary are predominant.
Statistical data compression techniques generally encode a single symbol at a time by using the probability of a character based upon its appearance. The simplest of statistical compression techniques uses a static table of probabilities. An example of this is an order-0 table that creates a probability of occurrence for a character without considering the previous character. Thus, the letter “u” might be assigned a 1% probability of occurrence. Another example is an order-1 table which, in contrast, creates a probability of occurrence for a character as a function of the previous character. Thus, the letter “u” might have a probability of occurrence of 98% if the previous letter is a “q.” However, static tables experience difficulty and are not always desirable. For example, to function correctly, the table (or the statistics used to build the table) must be passed to the decompressor in order to reconstruct the original data.
Although, this passage, or “overhead,” may only take about 256 bytes with an order-0 static table, an order-1 table, in contrast, might require as many as 65,536 bytes, or more. Thus, if an order-1 table or greater is used, the overhead of passing the table will most likely eradicate any gains potentially achievable by the table.
For this reason, many statistical compression techniques are “adaptive,” which provides several advantages. For example, with an adaptive technique, data does not have to be scanned before coding in order to generate statistics. Instead, the statistics are continually modified as new characters are read in and coded. However, this gives rise to a problem with the technique. When the compression starts, nothing is known about the data and the compression must “warm up.” Although compression ratios are greatly improved after only about a few thousand bytes, the initial compression is ineffective. This warm-up phenomenon is known commonly as “acceleration.”
Dictionary data compression uses a single code to replace variable length strings of symbols. In general, a dictionary technique reads in data and looks for groups of symbols that appear in the dictionary. If a match is found, a pointer or index into the dictionary can be output instead of the code for the symbol. The longer the match, the better the compression. In general, dictionaries are either static or adaptive. A static dictionary is used like a list of references in a published paper where reference to other authorities is marked by a single number. Static dictionaries have the advantage of being able to “tune” their dictionaries to fit the data that is being compressed. Static dictionaries, like static tables of probabilities, however, are problematic because of the excessive overhead required to transmit the dictionary from the encoder to the decoder. Thus, adaptive dictionaries are used to overcome this problem.
In general, adaptive dictionaries are continually modified as new characters are read in and coded. Again, adaptive dictionaries, like adaptive statistical tables, have poor initial compression characteristics during its acceleration period.
Two very well known examples of dictionary algorithms include the LZ77 and the LZ78. Progeny of these algorithms are numerous and have been used for both dictionary and statistical lossless data compression. They have even been used as hybrid statistical-dictionaries. Some of these better known progeny include commercial products, programs and algorithms such as the LZW, QIC-122, ARC, PKARC, PKZIP, LHarc, V.42bis, MNP-5, DCLZ, ARJ, PNG and GIF.
No matter which data compression technique is used, the traditional architecture used to compress and transmit data (or receive and decompress data) is usually configured as a singular compression channel. This compression channel typically includes a singular processing element, such as a digital signal processor (DSP), a singular data processing element, such as a microprocessor and a singular interface such as a processor bus or a data communication equipment (DCE) device. Although productive, such traditional architectures are plagued by shortcomings.
For example, consider the general situation when a user at a remote location desires to retrieve or access data files from a network or group of networks. In such a situation the user often uses a computer and modem (or similar device) to access a remote access server across a communications channel. This remote access server then acts as a gateway or passage mechanism by which the user gains access to the network(s).
Often, each individual network accessed by the user will have its own communication protocol. Yet, certain types of communication protocols have multiple logical channels therein which can allow the simultaneous processing of multiple data streams. As a result, a singular communications protocol can have numerous data streams therein. An example of this is a TCP/IP communications protocol having an HTML, E-Mail, FTP, source code, such as C and JAVA, text, and WAVE data stream simultaneously flowing therein.
One of the main problems with singular compression channels is manifest when the communication protocol appends a header to the data stream. In general, headers are used to facilitate and track the administrative and procedural tasks required to send data from one computing system configuration to another. Although many headers are compressed and have generally been pre-optimized to minimize the number of bits that must be used to convey data, putting a compressed header through a data compressor that adapts its dictionary to the statistics of the data stream will often result in degraded compression performance. This is because the dictionary will try and adapt to the statistics of a header that cannot be compressed any further. The result is a dictionary that never reaches a level where efficient coding of the redundant data following the compressed header can occur.
Even further compounding this problem is when successive headers are appended together. An example of this is when the transmission control protocol (TCP) attaches a header to each data stream before handing them off to the Internet Protocol (IP). In such a situation, the data stream looks like: IPheader+TCPheader+data stream. Thereafter, if this data stream is handed off to a network, such as a Package Data Network (PDN) where an X.25 ITU communications standard is used, the X.25 breaks the data stream into 128 byte packets, each with their own X.25 header. Thus, the data stream expands from the data stream and TCP/IP headers into: X.25header+IPheader+TCPheader+data stream. If the data stream itself is character-based, such as with FTP, XTERM, RLOGIN or TELNET, the TCP/IP headers alone can be 40 bytes long, or more, for each byte of data transferred. Consequently, application of a compression technique to this type of data stream—which is already largely compressed—would be highly inefficient, thereby eliminating much of the efficiency being sought via compression.
Thus, it would be highly desirable to provide a system and method that is capable of first identifying the state of compression of a particular data stream before further data compression is applied. In this way, if a data stream has already been previously compressed—such as in the circumstances described above—no further data compression will be attempted, thereby increasing the overall efficiency of the system. Such an approach would address many of the foregoing problems of utilizing a blind singular compression channel.