1. Technical Field of the Invention
The present invention relates to the compression of messages in communications using data protocols, e.g. Internet protocols.
2. Background and Objects of the Present Invention
Two communication technologies that have become widely used by the general public in recent years are cellular telephony and the Internet. Some of the benefits that have been provided by cellular telephony have been freedom of mobility and accessability with reasonable service quality despite a user's location. Until recently the main service provided by cellular telephony has been speech. In contrast, the Internet, while offering flexibility for different types of usage, has been mainly focused on fixed connections and large terminals. However, the experienced quality of some services, such as Internet telephony, has generally been regarded as quite low.
A number of Internet Protocols (IPs) have been developed to provide for communication across the Internet and other networks. An example of such an Internet protocol is the Session Initiation Protocol (SIP), which is an application layer protocol for establishing, modifying, and terminating multimedia sessions or calls. These sessions may include Internet multimedia conferences, Internet telephony, and similar applications. As is understood in this art, SIP can be used over either the Transmission Control Protocol (TCP) or the User Datagram Protocol (UDP).
Another example of an Internet Protocol is the Real Time Streaming Protocol (RTSP), which is an application level protocol for control of the delivery of data with real-time properties, such as audio and video data. RTSP may also be used with UDP, TCP, or other protocols as a transport protocol. Still another example of an Internet Protocol is the Session Description Protocol (SDP), which is used to advertise multimedia conferences and communicate conference addresses and conference tool-specific information. SDP is also used for general real-time multimedia session description purposes. SDP is carried in the message body of SIP and RTSP messages. SIP, RTSP, and SDP are all ASCII text based using the ISO 10646 character set in UTF-8 encoding.
Due to new technological developments, Internet and cellular telephony technologies are beginning to merge. Future cellular devices will contain an Internet Protocol (IP) stack and support voice over IP, as well as web-browsing, e-mail, and other desirable services. In an “all-IP” or “IP all the way” implementation, Internet Protocols are used end-to-end in the communication system. In a cellular system this may include IP over cellular links and radio hops. Internet Protocols may be used for all types of traffic including user data, such as voice or streaming data, and control data, such as SIP or RTSP data. Such a merging of technologies provides for the flexibility advantages of IP along with the mobility advantages of cellular technology.
As is understood in the art, the SIP, RTSP, and SDP protocols share similar characteristics which have implications in their use with cellular radio access. One of these similarities is the general request and reply nature of the protocols. Typically, when a sender sends a request, the sender stays idle until a response is received. Another similarity, as previously described, is that SIP, RTSP, and SDP are all ASCII text based using the ISO 10646 character set with UTF-8 encoding. As a result, information is usually represented using a greater number of bits than would be required in a binary representation of the same information. Still another characteristic that is shared by the protocols is that they are generally large in size in order to provide the necessary information to session participants.
A disadvantage with IP is the relatively large overhead the IP protocol suite introduces due to large headers and text-based signaling protocols. It is very important in cellular systems to use the scarce radio resources in an efficient manner. In cellular systems it is important to support a sufficient number of users per cell, otherwise implementation and operation costs will be prohibitive. Frequency spectrum, and thus bandwidth, is a costly resource in cellular links and should be used efficiently to maximize system resources.
In the UMTS and EDGE mobile communication systems and in future releases of second generation systems, such as GSM and IS-95, much of the signaling traffic will be performed by using Internet protocols. However as discussed, most of the Internet protocols have been developed for fixed, relatively broadband connections. When access occurs over narrow band cellular links, compression of the protocol messages is needed to meet quality of service requirements, such as set-up time and delay. Typically, compression over the entire communication path is not needed. However, compression of traffic over the radio link, such as from a wireless user terminal to a core network, is greatly desirable.
Standard binary compression methods, such as Lempel-Ziv and Huffman coding, are very general in the sense that they do not utilize any explicit knowledge of the structure of the data to be compressed. The use of such methods on Internet data protocols, e.g., SIP and RTSP, present difficulties for the efficient compression of communication messages. Standard binary compression methods available today are typically designed for large data files. As a consequence, use of such methods for the compression of small messages or messages with few repeated strings results in compression performance generally regarded as very poor. In fact, if the message to be compressed is small and/or contains few repeated strings, the use of some standard compression methods may result in a compressed packet which is actually larger than the original uncompressed packet, thereby achieving a counterproductive result.
One method for implementing a binary compression scheme is the use of a binary code tree. In a binary code tree, symbols or strings which are to be compressed are represented in a tree structure by a variable number of bits such that each symbol is uniquely decodable. Typically, symbols with higher probabilities of occurrence in the input data are represented by a shorter number of bits than those which have lower probabilities of occurrence. In the construction of the binary code tree, individual symbols are laid out as a string of leaf nodes connected to a binary tree. Symbols with higher probabilities of occurrence are represented as shorter branches of the tree resulting in a fewer number of bits being required to represent them. Conversely, symbols with lower probabilities of occurrence are represented as longer branches of the tree requiring a greater number of representation bits. When a string of input data matches a symbol in the binary code tree of the compressor, the code of the symbol is transmitted instead of the symbol itself resulting in data compression. A decompressor receiving the code reconstructs the original symbol or string using an identical binary code tree.
One example of a binary code tree compression scheme is that of a Huffman coding compression scheme. Huffman compression is a general compression method intended primarily for compression of ASCII files. Characters occurring frequently in the files are replaced by shorter codes, i.e. codes with less than the 8 bits used by the ASCII code. Huffman compression can be successful in files where relatively few characters are used in which the file to be compressed is relatively large.
Another method for the compression of data is the use of dictionary-based compression techniques. In general, a dictionary compression scheme uses a data structure known as a dictionary to store strings of symbols which are found in the input data. The scheme reads in input data and looks for strings of symbols which match those in the dictionary. If a string match is found, a pointer or index to the location of that string in the dictionary is output and transmitted instead of the string itself. If the index is smaller than the string it replaces, compression will occur. A decompressor contains a representation of the compressor dictionary, so that the original string may be reproduced from the received index. An example of a dictionary compression method is the Lempel-Ziv (LZ77) algorithm. This algorithm operates by replacing character strings which have previously occurred in the file by references to the previous occurrence. This method is, of course, particularly successful in files where repeated strings are common.
Dictionary compression schemes may be generally categorized as either static or dynamic. A static dictionary is a predefined dictionary which is constructed before compression occurs and which does not change during the compression process. Static dictionaries are typically either stored in the compressor and decompressor prior to use, or transmitted and stored in memory prior to the start of compression operations.
A dynamic or adaptive dictionary scheme, on the other hand, allows the contents of the dictionary to change as compression occurs. In general a dynamic dictionary scheme starts out with either no dictionary or a default, predefined dictionary and adds new strings to the dictionary during the compression process. If a string of input data is not found in the dictionary, the string is added to the dictionary in a new position and assigned a new index value. The new string is transmitted to the decompressor so that it can be added to the dictionary of the decompressor. The position of the new string does not have to be transmitted, as the decompressor will recognize that a new string has been received, and will add the string to the decompressor dictionary in the same position in which it was added in the compressor dictionary. In this way, a future occurrence of the string in the input data can be compressed using the updated dictionary. As a result, the dictionaries at the compressor and decompressor are constructed and updated dynamically as compression occurs.
A general criteria for successful compression using the aforementioned binary compression algorithms is that the file to be compressed is reasonably large. The codes for Huffman compression must not be too large compared to the file which is being compressed. For standard Lempel-Ziv compression, the file to be compressed must be large enough to have many repeated strings to achieve efficient compression. The messages produced by the aforementioned protocols are mostly a few hundred bytes and not large enough to allow efficient compression with the aforementioned algorithms on a message by message basis.
Thus, a need exists in the art for a system, methodology and apparatus to increase the efficiency of dictionary compression methods so that they may be used to compress messages which are transmitted between communication entities over bandwidth-limited communication links using communication protocols. The updating of the compression and decompression dictionaries should be performed as quickly as possible since the size of the dictionary has a large effect on the compression efficiency. In addition, the methodology should be robust so that lost packets do not make compression and decompression of the subsequent messages impossible.