The present invention relates to a data transmission system wherein data are transmitted from a transmit device to a receive device in a compressed form and more particularly to a method of updating the transmit and receive dictionaries when they are saturated without reducing the compression ratio.
In a data transmission system, modems communicate with each other by using a modulation method which translates digital data into analog signals and reciprocally. In the handshake procedure, the modems agree on a modulation technique and then they may try to negotiate an error-detection and correction method. If they agree on an error-detection and correction method, they may incorporate a data compression method to increase the effective throughput of data beyond the actual connection speed. Compression is possible only if error correction is also being done and the interface speed between the data terminal equipment and the modem is higher than the connection speed between the two modems.
When using error correction or compression it is essential to enable an effective form of flow control between each modem and the DTE it is directly connected to. Without effective flow control, data will be lost when one device sends data faster than the other one can receive it. Thus, flow control between the two modems can be handled by the error correction protocol V42.
Using V42 protocol between two modems and therefore between two DTEs results in an error free data transmission. When the transmission between two DTEs is error free, it is possible to use data compression which does not tolerate errors. V42bis protocol is used to compress the data flow before giving it to the error control function and decompress the data in the reverse way.
The V42bis data compression method is based on the Ziv-Lempel algorithm disclosed in an article entitled xe2x80x9cCompression of individual sequences via variable rate codingxe2x80x9d by Ziv and Lempel published in the IEEE Transactions on Information Theory IT 24 pp 530-536. In this algorithm, the encoding mechanism is based on the use of a codeword having limited length for each string of characters. Each character which is received from the DTE through the interchange circuit is associated with a string of characters represented by a characterizing codeword. This process maintains a transmit dictionary in which strings of characters are stored with their corresponding codeword. The transmit dictionary is dynamically updated in the course of the encoded mechanism. The codewords which are received from the modem through the error control functions are then decoded by the decoding mechanism in order to regenerate the original string of characters. To achieve this a receive dictionary associated with the decoding mechanism is also updated so that the two dictionaries on each side of the interchange circuit remain identical.
A method for data compression of strings of characters is described in the European Patent application 94 480176.0. According to this method, each codeword stored in the memory corresponds to four distinctive fields: a first field defining the index or the codeword of the last character of the current string being addressed in the memory, a second field defining the index or the codeword of the string (SON) that comprises the current string plus an additional character and which is the first string whose creation chronologically follows that of the current string being accessed in the memory, a third field defining the index or the codeword of the string (BROTHER) which appears within the dictionary after the creation of the current string being accessed in the memory and which has the same common characters as the current string except for the last, and a fourth field defining the index of the string (PARENT) that comprises all the characters of the current string except the last.
When each of the two dictionaries storing the codewords is initialized, it is empty. As a codeword generally contains more than 8 bits, the compression ratio is always less than 1 as illustrated in FIG. 1. Then, as the codewords are able to represent a plurality of data bytes associated with a string of characters, the ratio is improved and becomes more than 1. When the dictionary is full, that is, when the compression ratio is above the level of dictionary saturation, it is more and more difficult to improve the ratio as the process is more complex and the codewords need to be replaced by new ones (delete and update area).
When the memory space reserved to store the dictionary is full, codewords corresponding to new strings of characters to be transmitted can be stored only if a larger memory space is reserved for the dictionary. Such a larger memory space will result in larger address fields and consequently in larger codewords. For example, using a dictionary of 1K locations means using codewords of 10 bits. When such a dictionary becomes saturated, the next memory space should include between 1K and 2K locations, which would result in codewords including 11 bits. It is clear that using codewords having an additional bit for some strings of characters which are longer and longer and therefore not frequently used would result in a significant decrease in the compression ratio.
The object of the invention is therefore to provide a method of updating the transmit and receive dictionaries, especially when they are saturated, without significantly decreasing the compression ratio.
Accordingly, the invention relates to a method of updating dictionaries in a data transmission system using data compression comprising a transmit device and a receive device in which strings of characters have to be transmitted in a compressed form from the transmit device to the receive device; the transmit device having a transmit dictionary storing codewords associated with the strings of characters which are transmitted instead of the strings of characters from the transmit device to the receive device; the receive device having a receive dictionary storing codewords associated with the strings of characters; both transmit and receive dictionaries being updated each time a new string of characters has to be transmitted so that the contents of the dictionaries remain identical. This method comprises in storing, for each string of characters to be transmitted, a value into a specific field of the dictionary location in which is stored the codeword associated with each string of characters, this value corresponding to at least one parameter dependent on each string of characters; accessing, each time a new string of characters is to be transmitted, a plurality of the dictionary locations to determine which location among them has its specific field containing a value which is closest to a target value determined by a criteria met by the parameter; and deleting the contents of the dictionary location containing the closest value and using this dictionary location for the new string of characters.