The present invention relates to a method for encoding information, and a method for decoding information. The invention also relates to a device for encoding information and to a device for decoding information.
In information processing it is sometimes desirable to transform a message, carrying the information, such that the symbols in the message are adapted to suit a particular purpose. The concept of transforming a message is often referred to as encoding or decoding. Electronic devices for handling information commonly comprises memory units for storing information and display units for displaying said information after retrieval from the memory unit. For the purpose of maximizing the amount of storable information in the memory unit, and/or for the purpose of reducing the size of the memory unit the information can be stored in a compressed state in the memory units.
U.S. Pat. No. 5,062,152 relates to a method of processing an analog signal having an amplitude range with a non-uniform probability density. The method includes quantizing the analog signal as falling within one of plural signal levels, and assigning a binary code word to the quantization levels in accordance with the occurrence probability of the quantization levels. According to the method described in U.S. Pat. No. 5,062,152 each code word is predetermined to include eight binary-valued digits.
U.S. Pat. No. 5,488,616 relates to an encoding method. According to U.S. Pat. No. 5,488,616 symbols are provided, each symbol having an occurrence probability. The first method step is to assign a variable-length-code-word to each symbol according to occurrence probability of each symbol. This step uses Huffman coding. Thereafter the variable-length-code-word is coded in two different fashions to provide a first code C32 and a second code C34. In a final step one or both of the codes C32, C34 are selected to provide a reversible variable length code.
One problem which the invention addresses is to achieve a reduction in the amount of memory space required for storing a certain amount of information. More specifically an embodiment of the invention relates to the problem of achieving a compression of information in a manner which allows retrieval of the information by means of an information retrieval program which, in itself, requires a minimum of memory space.
The above mentioned problem is addressed by a method for reducing the number of binary digits in a message, the method comprising the steps of receiving at least one message comprising a plurality of characters; and encoding the message in accordance with a predefined Huffman coding method such that a compressed message is generated, the compressed message having a reduced number of binary digits. For the purpose of improving the compression effect the Huffman coding is preceded by the steps of calculating, for each character, a value indicating a relative frequency of the character in the at least one message; and assigning a first binary coded code symbol to a character having the highest relative frequency, the code symbol comprising a plurality of binary digits. If, for example, the received characters were coded as eight binary digit words, and there are no more than 64 different characters in the received message it will be sufficient to use a word length of six binary digits in the symbols.
All or substantially all the binary digits in the first binary coded code symbol are selected to a first binary value, for example all the binary digits may be set to zero.
A unique symbol is assigned to each unique remaining character so as to generate a first encoded message. The binary digits in each symbol are selected such that the number of digits having the first binary value is maximized in the first encoded message.
The first encoded message is further processed such that a second set of binary digits is generated, the further processing including the step of selecting the digits in the second set of digits such that the number of digits having the first binary value in the second set is higher than the number of digits having the first binary value in the first set.
According to a preferred embodiment, the further processing is adapted to generate the second set of digits sequentially such that the sequence of binary digits in the second set of digits resembles the output of a memoryless Bernoulli source. According to a further embodiment the further processing is adapted to generate the second set of digits such that the entropy of the sequence of binary digits in the second set of digits is near the entropy of a memoryless Bernoulli source; and such that the distribution of the sequence of binary digits in the second set of digits is substantially independent of the distribution of binary digits in the first encoded message.
A problem to which the present invention is directed is to provide a method for transforming a message in such a manner that the transformed message is suited for efficient compression.
The invention is also directed to the problem of encoding a message such that a minimum of band width, or a minimum of power, is required when transmitting the message, e.g. via a radio link.
An embodiment of the invention is directed to the problem of retrieving information, such as ASCII-coded texts, from an encoded message.
An embodiment of the invention is directed to the problem of encoding an analog signal such that a minimum of band width, or a minimum of power, is required for transmitting it, e.g. via a radio link.
An embodiment of the invention is directed to the problem of providing a device which is capable of displaying text messages in a plurality of languages while keeping the memory space requirements to a minimum.
The invention is also directed to the problem of providing a device for transforming information such that the transformed information is suited for efficient compression.
The invention is also directed to the problem of providing an information retrieval program which, in itself, requires a minimum of memory space. The purpose of the information retrieval program is to retrieve information, such as ASCII-coded texts, from a compressed message.
A preferred embodiment of the invention relates to coding a message such that the information content is represented by a minimum number of binary digits (bits). The preferred coding is achieved by three main steps:
In a first step each character of a received message is translated into Hamming symbols. This step results in a reduction of the number of binary digits with value one (xe2x80x9c1xe2x80x9d) in the message. Additionally the number of binary digits required for representing the information is reduced.
In a second step the Hamming symbols are interpreted as a first bitstream Y, and the bitstream is subjected to an estimation process whereby a second bitstream E is generated. The estimation process results in a higher proportion of binary digits having value zero in the second bitstream E than in the first bitstream Y. Additionally the sequence of binary digits in the second bitstream E resembles the output of a memoryless Bernoulli source. Since the number of binary digits with value one (xe2x80x9c1xe2x80x9d) is very low, and since the sequence of binary digits in the second bitstream E resembles the output of a memoryless Bernoulli source, the conditions for successful Huffman coding are optimized in the second bitstream E.
In a third step the second bitstream E is compressed by means of Huffman coding such that a compressed set C of binary digits is generated. This step results in a distinct reduction of the number of binary digits required for representing the information since the second bitstream E provides optimum conditions for Huffman coding.