Until recently, television viewing has been substantially passive. A viewer is presented with a number of programs on different channels, and perhaps one channel providing television programming information. Now, viewers increasingly demand more robust systems for providing them with real-time programming information. Such information, for example, may be provided in the form of an electronic programming guide ("EPG"), such as TV Guide On-Screen.
Typically, a central provider, such as a cable television company, codes program broadcast data and EPG data on a signal, which is distributed to a large viewer audience. The signal is received and decoded by individual viewers through the use of set-top boxes. Optionally, a viewer may be able to communicate data back to the service provider or to other viewers.
Compressing EPG data is desirable for a number of reasons. A major goal of EPG design is to store as much programming information as possible and, by compressing existing data, significantly more data may be provided to viewers in an equivalent data space. This factor is particularly important in regard to designing an efficient set-top box, as a marginal increase in the cost of such a design may have a significant impact. Also, compressing EPG data improves a transmission channel's bandwidth. Increased bandwidth is pertinent with regard to the transmission of data both upstream and downstream to an individual viewer, and significant compression ratios may enable applications previously unknown.
The suitability of a particular compression technique depends on the constraints presented by a specific application. In the context of compressing and decompressing data comprising an EPG, four constraints are pertinent.
First, EPG data comprises short messages, which are typically 10 to 250 characters and rarely more than 1000 characters in length. Most compression techniques assume that text strings will be very long, and are directed to operations on larger data objects, such as entire computer files. The set of data comprising an EPG illustratively comprises approximately 88 kilobytes of uncompressed text divided into 1000 short messages, each of which must be adaptable to independent compression and recovery. Existing compression techniques do not provide an ideal solution.
Second, a viewer's set-top box has limited storage. One of the motivations for compressing EPG data is to save memory space, so any resource costs must be subtracted from resource savings. Most compression techniques assume that significant storage space is available for running a program to decode data and to store associated data structures. In the context of an EPG, decompression must be performed in very limited code and data space.
Third, to avoid the need to update data used in decompression, the decompression operation should not rely on the use of repeated words or phrases. EPG data contains large proportions of unique or rare proper nouns. Also, popular words or phrases are cyclical (e.g., "basketball" is popular in the spring while "football" is popular in the fall) and change over time (e.g., names, current events, popular shows, etc. change over time). Because word selection in an EPG is broad and transient, a dictionary keyed to individual words or phrases is inefficient and should be avoided.
Fourth, a viewer's set-top box has limited processing resources, as a set-top box should be inexpensive for wide-spread distribution. Even without a decompression feature, performance of a set-top box is a major concern. The decompression operation should be performed without heavy processing, and a technique using extensive searching is not acceptable.
Although a number of compression techniques are in use today for digital data, existing compression techniques, standing alone, do not adequately address the four constraints presented in the EPG context. Run-length coding, which takes advantage of long strings of repeated characters, is not useful for EPG data that generally includes few repeated characters. LZ coding, in which repeated strings of characters reference earlier occurring strings of characters, requires too much processing power and memory, and only provides about 40% compression for EPG data. Huffman coding, although requiring little processing power and memory, only provides about 40% compression for EPG data. Character substitution coding, in which frequently occurring pairs of characters are replaced with unused character codes, is similar in demands and results to Huffman coding.