This invention claims priority to German Patent Application No. DE 101 40 993.1, which is hereby incorporated by reference herein.
The present invention relates to a method for compressing data, in which, in a data stream composed of characters, character strings are checked for correlation with other character strings that are present at a given distance in the data stream, and in which, in each case, the number of correlating characters and the position of the correlating characters within the respective other character string constitute the compressed data.
To be able to transmit or store data efficiently, use is made of methods for compressing the data. In connection with these methods, a distinction is made between lossless and lossy compression methods. The lossless methods have the feature that the original data can be completely constructed from the compressed data. In the case of lossy methods, however, complete reconstruction of the original data is not guaranteed.
Compression methods having the objective of reducing the respective data volume are used in many ways in information and communication technology, for example, in digital television or in electronic communication.
Compression methods are also used in connection with data encryption, the source text being compressed prior to encryption, thus making cryptoanalysis more difficult due to low redundancy
Methods for compressing data, in which, in a data stream composed of characters, character strings are checked for correlation with other character strings that are present at a given distance in the data stream, and in which, in each case, the number of correlating characters and the position of the correlating characters within the respective other character string constitute the compressed data, are referred to as Lempel-Ziv methods. One of these methods is described in Ziv J., Lempel A. xe2x80x9cA Universal Algorithm for Sequential Data Compressionxe2x80x9d, IEEE Transactions on Information Theory, Vol. 23, No.3, May 1977, pp. 337-343, which is hereby incorporated by reference herein.
An object of the present invention is to provide a method for compressing data which has a relatively high compression rate.
The present invention provides a method for compressing data, in which, in a data stream composed of characters, character strings are checked for correlation with other character strings that are present at a given distance in the data stream, and in which, in each case, the number of correlating characters and the position of the correlating characters within the respective other character string constitute the compressed data, wherein at least one character is allowed to differ in the correlation check; and in addition, data for correcting the at least one differing character is inserted into the compressed data. Preferably, an item of information on the position of the at least one differing character is inserted.
By allowing xe2x80x9cerrorsxe2x80x9d in the correlation check, the number of correlating characters is on average higher than in the case of an exact check and thus, on average, longer strings of characters can be coded using the information on the number and position. The number of permitted differing characters can be selected depending on the property of the data to be compressed.
The characters forming the data stream can be of different types in the method according to the present invention. Thus, for example, characters, which can assume many values, or binary characters are possible.
In certain embodiments of the present invention, when working with characters which can assume more than two values, the true value of the at least one differing character is inserted or, a procedure for determining the true value from the value of the differing character is inserted.
For example, when compressing text data, this procedure can consist in regarding a word or a part of a word as correlating with a word or a part of a word, which, as such, is identical but in which an upper case letter occurs in place of a lower case letter, for example, at the beginning of a sentence. Then, instead of the true value, for example, an upper case D, it is only required to insert into the compressed data a procedure for changing the lower case d during decompression; in the example: replace the lower case letter with the corresponding upper case letter.
In another embodiment of the present invention, when working with binary characters, the differing characters are marked by inserting only their position.
In order for the compression gain achieved by the method according to the present invention to be diminished as little as possible by the additional information, in an embodiment of the method a compressing code is used for coding the positions of the differing characters. Preferably, binary vectors having the length n and the weight e are used for coding e positions of differing characters over a length of n, all binary vectors of a particular weight being numbered.
In order to prevent randomly occurring errors during transmission or storage in the data compressed using the method according to the present invention, the compressed data is coded in an error-correcting manner, adding redundancy. In this context, the error-correcting code may be a block code or a convolutional code. In this connection, suitable block codes include Reed-Solomon codes and Hamming codes.
According to the present invention, the length of the compressed data is preferably a multiple of 8 bits. This allows simple adaptation to other data processing methods and to suitable devices.
The method according to the present invention can be performed using programmable devices (microprocessors, microcontrollers) and suitable programs as well as with hardware adapted to the method according to the present invention.