Data compression is important in the area of data transmission and storage. One family of algorithms for data compression is the run-length encoding algorithms. Run-length encoding algorithms are frequently the method of choice for data compression when data predominantly consists of sequences of equal-valued bits, such as visual images, because of the high compression ratios they can attain.
In this family of data compression algorithms, often, an alphabet of symbols is defined, where such a symbol can consist of one or more bits. Then the data compression is achieved by representing the data as pairs of a symbol and its repeat factor, i.e., its successive occurrence. In the case where the alphabet consists only of two bit values, such as 0 and 1, an alphabet need not be defined because successive repeat factors, commonly also referred to as, run-lengths, can designate the length of sequences of alternating bit values: one repeat factor is the number of 0-valued bits and the next repeat factor is the number of 1 -values bits. If one bit value predominantly occurs with a repeat factor of one, then only the repeat factor of the opposite bit value which occurs with varying repeat factors may be written. This implies that between any two such repeat factors, there is a single bit of the opposite bit value.
A problem that limits the efficiency of run-length encoding algorithms is that the repeat factors themselves occupy a certain number of bits, and it can be difficult or impossible to determine the optimal number of bits to represent these repeat factors, especially without knowing the actual data to compress. To aggravate the problem, even if the distribution of repeat factors for the entire data set or bit stream were known before the encoding process is to begin, the best choice of the number of bits to encode the repeat factor need not be constant throughout any one data set or bit stream. In these cases, a sub-optimal data compression ratio most likely results from any fixed choice of the number of bits to encode the repeat factor. Using too many bits to represent the repeat factors is wasteful and contrary to the objective of data compression. Using not enough bits to represent the repeat factors would force the algorithm to introduce zero-length sequences of the opposite bit-value, or other measures for the same purpose, and to then encode the remainder of the sequence that was too long to represent, in the same fashion.
One example of the problem above occurs in a multicast satellite system. FIG. 1 illustrates such a conventional satellite system 100 which may use data compression algorithms. The system includes a central site 102, at which is a transponder 110 and a server 112. The central site 102 communicates with remote clients 104 via a satellite 108. Receivers 106 at the client sites facilitate this communication. To ensure the correct delivery of data over the satellite 108 to remote clients 104, the multicast application server 112 must be able to determine if the data was received correctly by all addressed clients, and the server 112 must resend the incorrectly received data. Therefore, the clients 104 must respond to the server 112, indicating which of the data need to be resent, either because they were missed altogether, or because they contain more transmission errors than what can be reconstructed. The response from each client can be different from that of any other client, and the amount of back traffic from all clients 104 to the server 112 will scale approximately linearly with the number of receivers. This client response data traffic, or back traffic, is a serious burden not only for the central site server 112, but also for the server's local network which may not have sufficient bandwidth and/or responsiveness to handle this client response traffic without impacting other duties of that network. In the case where the server 112 sends the payload data by satellite 108 while the clients 104 respond over a much slower terrestrial network, this client response traffic may be the single most serious limiting factor to increasing the number of clients. Thus, the flooding of a multicast sender/server 112 by the responses from its clients 104, called client response implosion, is a common and significant impediment to the runtime performance and to the scalability in the number of receivers and of multicast and broadcast applications. The possibility of client response implosion is aggravated when transmission errors increase due to adverse weather conditions, obscuring of the line of transmission, or sun spot activity. The efficiency of the run-length encoding algorithm used in data compression by the network is therefore a critical factor.
Accordingly, there exists a need for a method for providing an improved run-length encoding algorithm for data compression. The present invention addresses such a need.