The invention generally relates to data encoding/decoding systems for use with dictionary-based encoding/decoding systems.
Dictionary-based encoding systems typically involve an encoding algorithm for encoding an input stream that avoids duplication by producing an encoded stream that assigns codes to input strings that are repeated in the input stream. The decoding system typically applies the same algorithm in the reverse order to decode the encoded stream to provide a decoded output stream. Conventional dictionary-based decoding systems need to know the specific algorithm by which the input stream was encoded by the encoding system.
For example, U.S. Pat. No. 4,054,951 discloses a data encoding/decoding algorithm that involves assigning a five-part tag for each section of encoded data (flag, start, address, length, repetition). U.S. Pat. Nos. 5,414,425; 5,463,390; and 5,506,580 disclose encoding/decoding algorithms that use a three-part tag (flag, pointer, length). These systems, however, generally require the use of a history buffer as the dictionary.
U.S. Pat. No. 4,558,302 discloses an encoding algorithm that requires maintenance of a two part tag (prefix, extension character). The complementary decoding system traverses the linked list of prefixes to output the decoded stream. U.S. Pat. No. 4,464,650 discloses a system that employs a similar encoding/decoding algorithm, and further discloses the use of a tree during the decoding process. Each of these systems, however, requires that the decoding system know the algorithm by which the encoded stream was developed by the encoding system.
For example, FIG. 1 shows a prior art dictionary-based decoder system 10 in which an input stream 12 is received by a code assembler (step 14). The code assembly 14 provides a reference code 15, and if the reference code is a literal code (step 16), then the literal code 17 is output (step 18) to an output stream 20. A reference code is a code that implicitly encodes a string of output values by referencing previous literal and/or reference codes. A literal code is a code that explicitly encodes a fixed string of output values. The code assembler (step 14) also outputs data 21 to a dictionary routine. The data 21 includes the previous code as well as the first character of the current code. The dictionary routine inserts the new code into the dictionary (step 22), and the dictionary is updated (step 24). The routine then looks up the reference code in the dictionary, and advances the current code to the new code (step 26). The dictionary is then traversed for all literal codes (step 28), and the literal codes 29 are the output (step 18) to the output stream 20. Reference codes that 15 that are not literal codes (step 16) are passed to the dictionary routine to look up the reference code in the dictionary and to advance the current code to the new code (step 26). The dictionary is then traversed for all literal codes (step 28) and the literal code is then output (step 18) to the output stream 20.
As an example, FIG. 2A shows at 30 a coded input string for text for illustrative purposes. FIG. 2B shows at 32 a table of each input code, each associated output string, and each associated new entry consisting of a prefix and a suffix. In particular, the first reference code is an A, which is provided to the output string. The next reference code is a space, and a definition is created for the pair (A,<SPACE>). This definition is given a unique name (e.g., 256). Subsequent pairs are given unique names (e.g., 257–261). When a reference code is encountered in the input string (e.g., 256), the system records a code (262) that is defined as the prefix being 256 and the extension character P. In this fashion, the output string (as shown at 34 in FIG. 2C) is developed using by traversing the linked list, which may include many embedded references. The more compressed the code, however, the longer it will take to traverse the linked list.
U.S. Pat. No. 5,058,144 discloses an encoding/decoding system that requires encoding a search tree that is used by the decoding system to output decoded symbols. The decoding process requires traversal of this tree. U.S. Pat. No. 5,153,591 discloses an encoding/decoding system that also requires the use of a search tree in the decoding system, and U.S. Pat. No. 5,243,341 discloses an encoding/decoding system that employs a second dictionary to preserve information prior to reset of the first dictionary.
U.S. Pat. No. 6,404,362 discloses an encoding/decoding system that seeks to eliminate the need to traverse data structures during decoding of an encoded input stream that uses a self-building dictionary. In particular, a dictionary of a structure with prefix string code and extension characters is stored, together with a fast or finder memory that contains address and lengths of all sub-strings that have been encountered in the input stream. The decoding system first looks up the finder memory for a fast search for all blocks of string memory that may be easily copied. If this search fails, the decoder builds a new sub-string from the dictionary, and outputs this sub-string to the output buffer. The system also inserts this address and length into the finder memory, and employs a separate variable-length string memory to access blocks of sub-string data.
There is a need, therefore, for a decoding system for dictionary-based encoding/decoding systems that more efficiently and economically provides decoding of dictionary-based encoded data.
Further, there is a need for such a decoding system for dictionary-based encoding/decoding systems that does not require the use of complex structures such as search trees, history buffers, or second dictionaries.
There is further a need for a decoding system for dictionary-based encoding/decoding systems that does not require the use of a traditional dictionary that consists of prefix and extension codes.