In recent years, such as when character string data, acoustic signal data or image information data is transmitted via a communication path or when the data is recorded in an information recording medium, the amount of information in the data is often reduced using information compression-coding.
In some encoding methods, information is compressed by handling a unit of a predetermined number of bits as a character and using repetitive character strings as a model (for example, see the non-patent literature 1). These encoding methods include methods such as LZ77, LZ88 or LZW (for example, see the non-patent literature 1 and the non-patent literature 2).
The information compression-coding method by handling repetitive character strings as a model is used not only or compression-coding of such as a text document, but also for encoding of a modem signal and an image signal.
FIG. 1A shows an instance of an information compression-coding device for a character string by the LZ77 method and FIG. 1B shows an instance of a corresponding decoding device. In the encoding device according to the LZ77 method, a dictionary registering part 12 retains an input character string from an input terminal 11 in a dictionary part 13 by a certain amount. A match searching part 14 compares a new inputted character string with the past character strings in the dictionary part 13. A code generating part 15 outputs a position (index) in the dictionary part (buffer) 13 of a character string with the largest number of matching characters containing more than a predetermined number of characters (longest matching character string), the length of the matching portion, and a character following the longest matching characters in the input character string (mismatching character) as a code to an output terminal 16. If the length of the longest matching character string is less than the predetermined number of characters, the code generating part 15 outputs a code indicating the mismatch, for example a code of a position (index) 0 in the buffer and the matching length 0, and the first character of the input character string having the matching length less than the predetermined number of characters.
With respect to the dictionary part 13, its end connects to a read-ahead buffer 17 to collectively configure a slide buffer, as shown in FIG. 1C for example. Input character strings are stored serially from the end, a right end in the drawing, of the read-ahead buffer 17. When the read-ahead buffer 17 is full, match searching processing on a character string starts from a left end of the read-ahead buffer 17. When a code is outputted, characters having a length of the longest matching character string or a character is discarded from the left end of the dictionary part 13, and the last encoded character string or a character is stored at the right end of the dictionary part 13. And, the input character string is stored at the right end of the read-ahead buffer 17.
In the decoding device according to the LZ77 method, a code parsing part 22 determines whether or not an input code from an input terminal 21 indicates mismatch. If the input code does not indicate mismatch, the code parsing part 22 separates an input character string into position information (an index), the length of sequentially matching part, and a character that mismatches (a mismatching character). A character string retrieving part 23 acquires a corresponding character string from a dictionary part 24 based on the position information and the length of the sequentially matching part. A character string combining part 25 combines the character string acquired from the dictionary part 24 and the mismatching character to output as a decoded character string to an output terminal 26. A dictionary registering part 27 stores the decoded character string in the dictionary part 24. The dictionary part 24 is a slide buffer in which a character string is retained in the same state relative to the dictionary part 13 in FIG. 1A.
If a code indicating mismatch is inputted, the code parsing part 22 outputs the following mismatching character as it is and store it in the dictionary part 24.
The LZ77 encoding method has been improved to LZSS, ZIP and the like. These methods such as ZIP further compress an output of a code obtained by LZ77 with the Huffman coding, a kind of entropy encoding.
FIG. 2A shows an instance of an information compression-coding device for a character string using a typical LZ78 method and FIG. 2B shows an instance of a corresponding decoding device. In the LZ78 encoding, processing starts with a dictionary part 31 being empty. A match searching part 32 compares a new inputted character string from the input terminal 11 with character strings registered in the dictionary part 31. A code generating part 33 pairs up an index in the dictionary part 31 corresponding to the longest matching character string which contains the longest number of matching characters, with an index corresponding to the next mismatching input character next to the longest matching character, and outputs the pair as a code to the output terminal 16. Each time a dictionary registering part 34 outputs a code, it newly registers a character string containing said longest matching character string attached with the next character in the dictionary part 31. The dictionary part 31 is not a slide buffer, but a fixed one. If an inputted character is not found in the dictionary part 31, the encoding device in FIG. 2A outputs a special index indicating the mismatch and a character as it is as a code.
In the LZ78 decoding device in FIG. 2B, the initial state of a dictionary part 941 is the same as that of the dictionary part 31 used for the encoding. That is, the dictionary part 941 starts its processing in the empty state. A code parsing part 942 separates the input code into an index of the longest matching character string and an index of a following mismatching character. A character string retrieving part 943 acquires the matching character string and the mismatching character from the dictionary part 941 using the indices. The character string combining part 25 decodes the character string by combining the acquired matching character string and the mismatching character to output to the output terminal 26. While this processing, a dictionary registering part 944 newly registers a character string containing the longest matching character string attached with the mismatching character in the dictionary part 941 by the same method as that is used in the encoding device. In this manner, the dictionary part 941 in the decoding device and the dictionary part 31 in the encoding device are kept in the same internal state.
In the LZ78 encoding, the dictionary parts 31 and 941 are configured in a tree structure. A method is also known of speeding up the search such as by compressing a character string with a hash function when the search is made to check whether or not an element equal to an input character string has been registered in the dictionary.
FIG. 3A shows an LZW encoding device and FIG. 3B shows a corresponding decoding device. In a dictionary part 35 in the LZW encoding device, a limited number of initial characters (data) have been previously registered. A match searching part 36 compares an inputted character string with character strings registered in the dictionary part 35. Then, the match searching part 36 outputs the longest matching character string, which contains the largest number of matching characters, and an index in the dictionary part 35 corresponding to the character string to a code generating part 37. The code generating part 37 outputs the index in the dictionary part 35 as a code from the output terminal 16. The code generating part 37 further outputs the longest matching character string to a dictionary registering part 38. The first character CF of the character string is attached to the end of the previous longest matching character string retained temporarily in a previous buffer 38a and registered in the dictionary part 35. That is, the previous encoded character string and the first character CF of the current encoded character string are registered in the dictionary part 35 as a sequential character string.
Next, the decoding device in FIG. 3B will be described. A code parsing part 946 acquires an index of the longest matching character string serially from an input code from a terminal 21. A character string retrieving part 947 reads out a character string corresponding the acquired index from a dictionary part 945. A character string combining part 948 outputs the character string read out from the dictionary part 945 as a decoded character string to the output terminal 26. At the same time, a dictionary registering part 949 attaches the first character CF of the current decoded character string to the end of the previous decoded character string in a previous buffer 949a to register in the dictionary part 945.
On the other hand, compression-coding methods involving distortion for acoustic signal data (a digital sample value string) include MP3, AAC, TwinVQ and the like. Compression-coding methods for image information data (a sample value) include JPEG and the like. Techniques of reversible encoding (lossless encoding) involving no distortion are also known (for example, see the non-patent literature 3). Moreover, lossless compression of data in a floating-point format which is easy to edit and manufacture is also important. Particularly, in some techniques, an audio signal in a floating-point format is partitioned into only an integer string and possible non-zero bits in a mantissa being the remaining (difference) portion and each of them is encoded, thereby improving compression efficiency (for example, see the non-patent literature 4).
Multiplying each sample s0(i) in an original sample string by a common real number G as a gain results in a sample s1(i) being s1(i)=s0(i)×G. It is often the case that the sample s1(i) obtained by multiplying by the real number G is represented in a binary notation or a binary floating-point notation according to an IEEE-754 format and the digital sample strings are encoded. A floating-point format standardized according to the IEEE-754 is 32-bit format as shown in FIG. 4, which is composed of, from the highest-order bit, a sign of 1 bit, an exponent of 8 bits and a mantissa of 23 bits. Denoting the sign by S, a value represented by the 8-bit exponent by E in a decimal number and the binary number of the mantissa by M, respectively, the numerical value in a floating-point format is represented in a magnitude binary notation as an formula (1):
[Formula 1](−1)S×1·M×2E−E0  (1)According to the IEEE-754, it is defined that E0=27−1=127, so E−E0 in the formula (1) can have the following range of arbitrary values:−127≦E−E0≦128However, E−E0−127 is defined to be all “0”s and E−E0=128 is defined to be all “1”s. E−E0=n represents a value obtained by subtracting one (1) from the number of digits (the number of bits) in an integer portion of a value represented by the formula (1), i.e., the number of lower significant bits below the most significant “1”.Non-patent literature 1: Nelson & Gailly (translated into Japanese by Ogiwara & Yamaguchi), “The Data Compression Book, Second Edition”, Chapters 7-9Non-patent literature 2: D. Salomon, “Data Compression”, pp. 101-162 (Chapter 3)Non-patent literature 3: Hans, M. and Schafer, R. W., “Lossless Compression of Digital Audio”, IEEE Signal Processing Magazine, Vol. 18, No. 4, pp 21-32 (2001)Non-patent literature 4: Dai Yang and Takehiro Moriya, “Lossless Compression for Audio Data in the IEEE Floating-Point Format”, AES Convention Paper 5987, AES 115th Convention, New York, N.Y., USA, 2003 Oct. 10-13