Data compression systems convert a stream of symbols forming an input message into an encoded stream of symbols forming an output message, from which the input message can be reconstructed. Successful compression results in an encoded output stream that is shorter than the original input stream.
One type of data compression technique is known as statistical modeling, which generally encodes individual symbols of an alphabet based on their statistical probabilities of appearance in a message. In general, statistical modeling methods represent symbols having higher probabilities of appearance with fewer bits than those symbols having lower probabilities of appearance. FIG. 1A is a block diagram illustrating a typical statistical modeling method: a statistical model 120 is used to determine probabilities for the symbols of the original stream 110 of a message. A model encoder 130 uses the determined probabilities from the model 120 in conjunction with the symbols from the original message stream 110 to determine codes for the symbols which are used to produce an encoded message stream 140.
FIG. 1B is a block diagram illustrating the decoding of the encoded message stream 140. As shown, the model decoder 160 decodes the encoded message stream 140 using a statistical model 150 corresponding to the encoding statistical model 120, producing symbols used to create a reconstructed message stream 170.
Statistical modeling methods may be static or adaptive. Static methods, such as those illustrated by FIGS. 1A and 1B, use fixed statistical models 120 and 150 to provide probabilities for the symbols, whereas adaptive models dynamically update statistical models to reflect changes in the probabilities of the symbols as more messages are processed. FIG. 2A illustrates an example of an adaptive statistical modeling method. Specifically, after symbols are encoded, the statistical model 220 is updated to reflect a recalculated probability of occurrence for the symbols. FIG. 2B illustrates a corresponding decoding system. Assuming that the encoding and decoding systems start with an identical initial model and that symbols (or their corresponding codes) are processed in the same order by the encoding and decoding systems, models 220 and 250 can be updated in sync so no additional data need be transmitted to ensure consistency between the models 220 and 250 used by the encoding and decoding systems.
Alternatively, as illustrated by FIGS. 2C and 2D, adaptive systems can be implemented to lessen the processing and memory resources required for decoding encoded message streams. The encoder 230A can be implemented such that when encoding a symbol using updated data from the model 220 for the first time, the model update information is transmitted along with the encoded symbol in the encoded message stream 240A. When received by the decoder 260A, the model update information is used to update the statistical model 250 so that the updated model can be used to decode the encoded message stream 240A to properly determine the symbols forming the reconstructed message stream 270.
Another type of data compression technique is dictionary-based compression. In contrast to the statistical model-based methods which use codes for individual symbols of an original message stream, dictionary-based methods use codes for strings of symbols in the original message stream. Typically, a dictionary-based compression method maintains a table of recognized strings. Strings in the original message stream that match string entries stored in the dictionary are encoded using a code representing the corresponding dictionary entries.
Examples of dictionary-based compression methods include the Ziv-Lempel ("LZ") family of data compression techniques first developed by Jacob Ziv and Abraham Lempel. See, e.g., Jacob Ziv & Abraham Lempel, A Universal Algorithm for Sequential Data Compression, IEEE TRANSACTIONS ON INFORMATION THEORY, Volume 23, No. 3, at 337-43 (1977). Referring to FIG. 3A, one variation of the LZ-type compression methods provides a dictionary encoder 330 that processes strings of an original message stream 310 by determining whether a current string matches a string previously stored in the dictionary 320. The current string is initially the first symbol of the original message stream 310 not already encoded, and the dictionary 320 is initialized to include an entry for each possible symbol to ensure that an initial string will match at least one dictionary entry. As long as the current string matches a dictionary entry, the dictionary encoder 330 adds the next symbol from the original message stream 310. If the current string does not match a dictionary entry, the dictionary encoder 330 creates a dictionary entry for the current string and encodes the current string using the index of the newly created dictionary entry. In this case the information added to the encoded message stream includes not only the encoded string, but also update information for the newly created dictionary entry. The process continues until the original message stream 310 is completely processed. The encoded message stream 340 thus consists of a series of codes corresponding to the dictionary entries representing strings of the original message stream along with any necessary dictionary update information.
FIG. 3B illustrates a corresponding LZ-type dictionary-based decoding system. The dictionary 350 is updated as new information is received in the encoded message stream 340. Using the dynamically updated dictionary 350, the dictionary decoder 360 processes the dictionary codes in the encoded message stream 340 by retrieving strings corresponding to the dictionary codes from the dictionary 350, thus producing a reconstructed message stream 370.
This example of a dictionary-based compression method is inherently adaptive, as a dictionary is customized for each encoded message. In general, the compression rate achieved by dictionary-based compression methods is proportional to the number of matches found between strings of the message stream and dictionary entries. Because the number of matches tends to increase as more of a new message is processed and more dictionary entries are created, the compression rate achieved tends to be low when processing begins and increases as more of the message is processed. Accordingly, better compression rates tend to be achieved on larger messages.
Data compression methods, such as those described above, enhance the efficiency of systems that manipulate data by, for example, storing or transmitting data. However, these methods can require significant processing and memory resources which make them unsuitable for use in systems such as wireless paging, or other messaging systems in which the receiving devices have limited processing and memory capability and messages tend to be small.