1. Field of the Invention
The invention relates generally to data and image processing. More specifically, the invention relates to arranging binary encoded data of a compressed image in computer memory.
2. Description of the Related Art
In image and/or data compression, through a process known as binary encoding, a set of values, such as text or numerical data that are obtained or input externally, can be encoded into binary form (1s or 0s). One way of encoding is to simply convert each decimal number or code for text (such as ASCII numerical designations) into a fixed number of bits (word size). For instance, the numbers 23, 128, and 100 would be encoded into binary as the sequence: 00010101 1000000 01100100. This raw or pure binary code serves no further compression function, since it merely takes the data and represents it in binary. Such encoding is inefficient where the number of zeroes greatly outweigh the non-zero, and especially where such zero data values are consecutive, creating a large "run" of zeroes in binary. Several methods have been developed particularly in the field of digital communications to compress data during binary conversion. Among two widely-used such methods of binary encoding for image or data compression are Huffman Coding and Run-Length Encoding.
Classical Huffman Coding is a variable length coding technique which consists of coding each possible value y.sub.i (i=1, . . . , N) inside a given data set S of N possible data values by codewords of L.sub.i bits each. The goal behind Huffman Coding is to minimize .SIGMA.L.sub.i P(y.sub.i), where P(y.sub.i) is the probability of the value y.sub.i occurring in data set S that is to be encoded. The codewords are chosen in order to make them distinguishable from each other. The codewords have a variable length, for instance, for a data set S={0, 1, 2, 3, 4} the Huffman Coding may use the mapping {0=0, 1=10, 2=110, 3=1110, 4=1111}. If P(0)&gt;&gt;P(1)&gt;&gt;P(2)&gt;&gt;P(3)&gt;&gt;P(4), this technique may be more efficient than straight fixed length binary representation. The Huffman Coding is efficient primarily when coding data sets S with a small N or that have a small variance, since L.sub.i grows in size almost linearly with an increase in N, the number of values in the set. For this reason, a technique different from classical Huffman Coding known as Modified Huffman Coding has been developed and used in images that have larger N in their data sets or more variance.
Another technique known as Zero Run Length Coding (ZRLC) is a technique for encoding a data set containing a large number of consecutive or "runs" of zero values. ZRLC consists of encoding only the values different from zero (using Huffman Coding or some other coding) and then interleaving these codewords by a code that specifies the number of zeroes that, according to a manner known both to the coder and to the decoder, divides two consecutive non-zero values.
In traditional ZRLC, the encoded zero run data is structured using two segments: a run length and non-zero value. For instance, instead of coding the data stream: EQU {0 0 0 0 0 0 5 0 0 0 -6 78 0 0 0 0 -12 0 0 0 0 0 0 0 0 0 0 0 1 45 0 0 0 0 0 0 0 0 0 23}
only the following data are coded: EQU {[6, 5][3, -6][0, 78][4, -12][11, 1][0, 45], [9, 23]}.
This code (where an_indicates a run length of zeroes) indicates that 6 zeroes followed by the value 5, then 3 zeroes followed by the value -6, then 0 zeroes followed by the value 78, . . . , etc. Recently, an enhanced zero run encoding technique has been developed which is set forth as the subject of related U.S. patent application entitled An Efficient Data Structure for Entropy Encoding Used in a DWT-Based High Performance Image Compression, Ser. No. 09/140,517, filed on Aug. 26, 1998 (hereinafter "ZRC patent"). In most ZRLC, the zero run length is represented by a flag and an L bit word following the flag. In the ZRC patent, a single flag of "0" with no L-bit word succeeding it is used to indicate a run of exactly 2.sup.L zeroes. When less than 2.sup.L zeroes are encountered, a flag of "1" and a word of L-bits encoding the length in the direct binary equivalent of the size is utilized. For instance, referring to the above example, if L=2, then the first six zeroes would be encoded as "0 110". The first 0 refers to the first 2.sup.2 or 4 zeroes while the bits following it "110" are composed of the flag "1" and the length "10" (whose decimal equivalent is 2) which encodes the final two zeroes of the six zero run. The data stream has a non-zero value 5 which is Huffman coded. Following the Huffman code for "5", the next three zeroes are encoded by "111" (flag of 1 and run-length of "11" (3 in decimal). This process continues with non-zero values encoded by a Huffman code and zero values encoded by an enhanced zero run length encoding scheme. This combination of encoding is a form of "entropy encoding". In entropy encoding information is represented in a minimum number of bits in accordance with content (entropy) of the whole data set.
In data sets such as JPEG (Joint Photographic Experts Group) or DWT (Discrete Wavelet Transform) transformed image data that contain significant numbers of zero values, particularly zero values that are in continuous "runs", entropy encoding can be defined to include separate zero run and non-zero value encoding. The structure for encoding the zero runs, discussed also in the ZRC patent is composed of a Flag F=1 and a word R of L bits for any run or portion thereof consisting of less than 2.sup.L zeroes or just contains the flag F=0 with no word R following it for any run or portion thereof. FIG. 1(a) shows one basic format 100 for entropy encoding that combines ZRLC with a variable length code for non-zero data values. In this format 100, a structure of the type (F, R) is followed by a variable length code V of M bits such as Huffman Coding of any non-zero values that follow the run of zeroes encoded by (F,R). The run, by definition, terminates when a non-zero value is encountered in the data set to be encoded. FIG. 1(b) is an example of the basic data format for the variable length coding technique known as Huffman coding. Referring to FIG. 1(a), the variable length code V, if it were a Huffman code would be composed of a "class code" and a "pointer" to a value within the class as typified in format 110 of FIG. 1(b). Each class code is a unique value which identifies a possible range of values for the value being encoding. Since zero values are encoded using ZRLC, unlike other Huffman codes, zero is excluded from the range of possible values. The table below shows an exemplary class code mapping for such Huffman coding:
Class Possible Values Class Code within class 1 11 -1, +1 2 0 -3, -2, +2, +3 3 10 -7, . . . -4, +4, . . . -7
Since all zero values are encoded using ZRLC, only non-zero values need to be represented by the Huffman code. Each class which identifies a range of possible values has a class code. Class 1 has only two possible values, -1 and +1 and thus can use a one-bit pointer value to indicate which of the two values is being encoded. Class 2 represents 4 values and thus, should use a pointer of two bits to refer to one of the values .+-.2 or .+-.3. Likewise, class 3 values can be pointed to by a three-bit pointer. The value "-1" would be encoded by "110" [class code of "11" and pointer of "0"]. Likewise the value "5" would be encoded by "10101"[class code of "10" and a pointer of "101"]. Advantageously, a negative value is represented by the one's complement of the positive value and hence, the value "-5" would be encoded by "10010" [class code of "10" and one's complement of the pointer to "5"]. Thus, using ZRLC the string of values "005" would be encoded by "1 10 10 101" where the leading "1" is the flag for the ZRLC and the adjacent "10" the binary equivalent of the run length "2". The "10 101" following it is the Huffman coding for "5" given a Huffman mapping such as that described above and a ZRLC technique such as that presented in the ZRC patent with a code length L=2.
Once data has been entropy encoded or otherwise binary encoded, devices such as digital cameras or computer systems need to then store or "pack" the data such that it is suitable for transmission and subsequent decoding. For instance, in a digital device, the encoded data may be packed into "words" of a particular size such as a byte (8 bits) each. This allows the data to be loaded easily into registers or mechanisms that have a fixed size. Typically, encoded data is packed in a left to right ordering format 210 as shown in FIG. 2(a). This ordering is not suitable for recently developed Intel architectures such as processors based upon MMX Technology (a product of Intel corporation). These processors follow the convention known in the art as "Little Endian" in which the memory byte ordering is right to left rather than left to right format 220 as shown in FIG. 2(b). With Little Endian, as shown in FIG. 2(b), a 4 byte (32-bit) register when stored into memory would be as follows. Starting from an offset location I, byte 0 would be stored in I+0, byte 1 in I+1, etc. with each byte having the most significant bit on the left side and the least significant bit on the right. Since entropy encoding is a variable length technique (the anticipated total length for encoding a set of data is unknown beforehand), the ability to shift out a block of bits all in one clock cycle can be used to advantageously reduce the amount of serial data transfers. However, considering the reverse nature of the register and memory storage within an MMX system, not all of the encoded data can be retrieved at variable length, some of it must be serially transferred and decoded bit-by-bit. Thus, there is needed an efficient data packing methodology for entropy encoded data which speeds and eases the decoding process by taking advantage of the characteristic properties of entropy encoded data.