Overview of the MQ-Coders and Decoders
Entropy coding is a core concept in many processes or devices for the compact representation of digital data. In the context of coding bit-values (0 or 1), entropy coding exploits any imbalance in the frequency of occurrence of bit-values (more 0's than 1's or vice-versa) to code the sequence of bit-values more compactly. For each bit-value coded or decoded, the entropy-coding component is supplied with a predicted bit value (termed MPS “more probable symbol”, the opposite bit value being termed LPS “less probable symbol”) together with a probability value P representing the level of confidence that the prediction is correct. When the prediction is correct, the bit-value is coded more compactly—the more confident the original prediction, the more compact the resulting coding. Conversely, when the prediction is incorrect, the bit-value must be coded using more than one bit—and the more confident the original prediction, the greater the penalty for an incorrect guess. However, the trade-off is such that the more confident the prediction, the more compact the overall coding is, even taking into account the penalty for mistaken predictions.
Let P be a value between 0.5 and 1 indicating the confidence of a given bit-value prediction. Values of P less than 0.5 do not occur because such would indicate that the opposite bit-value is more likely to occur. A value of P equaling 1 indicates perfect confidence—a predicted value that is guaranteed to be correct. The number of bits required to code the bit-value in the event the prediction is correct is given by (lg P) (base 0.5 logarithm of P). The number of bits required to code the bit-value in the event the prediction is incorrect is given by (lg(1−P)). Taking probabilities of both correct and incorrect prediction into account, the overall expected number of bits required to code the bit-value is H=(P lg P+(1−P)lg(1−P) ). Table 1 shown below shows the bits required to code correct and incorrect predictions, and the overall expected number of bits required, for various sample values of P.
TABLE 1Plg Plg(1 − P)H0.51.00001.00001.00000.60.86561.32190.97100.70.51461.73700.88130.80.32192.32190.72190.90.15203.32190.46901.00.0000infinity0.0000
The compactness of the coding is greatly enhanced by grouping the bit-values to be encoded or decoded into categories. The bit-values in each category are expected to have similar confidence values of predictability. Each bit-value to be encoded or decoded is assigned to one such category, called the context for that bit-value.
Thus an encoder/decoder pair exploiting such frequency imbalances for compact coding of data can be represented as in FIGS. 1A and 1B.
FIG. 1A shows an MQ encoder 102. For each input bit-value 104 to be encoded, additional control information 106 is input which directly or indirectly reflects the predicted bit-value and the confidence level P for the bit-value. The output from successive calls to the encoder is the encoded data stream 108 which may be transmitted or stored.
FIG. 1B shows an MQ decoder 103. For each bit-value to be decoded, additional control information 107 is input. In general this information must match exactly the additional control information previously supplied to the encoder 102 when encoding the symbol now to be decoded. Data is imported as needed from the coded data stream 105; in many cases the data needed to decode a given bit-value will have been imported from the coded data stream prior to the actual decoding operation for the bit-value. The decoded bit-value 109 is the output from the decoder 103.
To appreciate the decoding method of the invention, some background on encoding is useful. Accordingly, entropy encoding will now be discussed briefly.
FIG. 11 shows a known high-level procedure for expanding data incorporating an entropy encoder. From start step 1102, execution proceeds to step 1004, which performs any necessary initialization of state variables, look-up tables, etc., for the entropy encoder. Execution then proceeds to step 1006.
In step 1006, the next bit-value S to be encoded is determined, along with the necessary additional control information (“coding context”) cx. The details of this step as rule are independent of the operation of the entropy coder and highly dependent on the specific type of data being compressed. Execution then proceeds to step 1008.
In step 1008, the entropy encoder is used to encode the given bit-value S using the additional control information cx. Coded data is output as necessary to the coded data stream 1014. Execution then proceeds to step 1010.
In step 1010 a determination is made as to whether all necessary data has been coded. If not, then execution proceeds to step 1006. Otherwise, execution proceeds to step 1012.
In step 1012, any additional coded data needed to ensure correct decoding of all encoded bit-values is flushed to the coded data stream 1014. Execution then proceeds to the end 1016 of the procedure.
FIG. 10 shows a higher-level procedure for compressing data incorporating an entropy decoder. From start step 1102, execution proceeds to step 1104, which performs any necessary initialization of state variables, look-up tables, etc., for the entropy decoder. This operation may import data from the coded data stream 1114 to determine initial values of the decoding state variables. Execution then proceeds to step 1106.
In step 1106, the control information (“coding context”) cx necessary for decoding the next bit-value is determined. The details of this step as rule are independent of the operation of the entropy coder and highly dependent on the specific type of data being compressed. Execution then proceeds to step 1108.
In step 1108, the entropy coder is used to decode the given bit-value S using the additional control information cx. Coded data is imported as necessary from the coded data stream 1114. Execution then proceeds to step 1110.
In step 1110 a determination is made as to whether all necessary data has been coded. If not, then execution proceeds to step 1106. Otherwise, execution proceeds to the end 1116 of the procedure.
The MQ-coder is one of a family of arithmetic entropy-coders. It is used in particular as part of the JPEG 2000 and JBIG 2 image-coding standards ([4], [5]). The arithmetic decoding operation can be summarized as follows. The decoder maintains a code-stream value C which at any decoding step is allowed to range over an interval of values from Cmin to Cmax. At each step a split point Csplit between Cmin and Cmax is determined, dividing the interval from Cmin to Cmax into two generally unequal subintervals. The next decoded bit-value is determined to be 0 or 1 according to which subinterval contains the code-stream value C; that is according to whether C is greater than or less than Csplit. As a rule, the larger subinterval should correspond to the predicted value, and should have a length close to P times the total distance from Cmin to Cmax. The endpoints of whichever subinterval contains the code value C become values of Cmin and Cmax for the next bit-value to be decoded.
The MQ-coder operates according to this general principle, but is designed with the goal of rapid encoding and decoding. In the MQ-coder Cmin is fixed at zero, and Cmax is specified by a variable A. The split point Csplit is determined by setting the length of the lower interval to a value Q (i.e., Csplit=Cmin+Q=Q) depending on the confidence level P but independent of the total interval length A=(Cmax−Cmin). This is an approximation to the ideal value A(1−P), made for the sake of speed. In order for the approximation to be effective, the total interval length A must be maintained between a minimum value Amin and a maximum value Amax=2Amin. With each bit-value decoded, as the interval from Cmin to Cmax is replaced by the chosen subinterval length, the value of A decreases. When the value of A drops below Amin, the values of both A and C are repeatedly doubled (and data from the input buffer used to fill in the lowest-order bit-values of C), a process termed renormalization.
The approximation entailed in determining the length of the lower subinterval as a fixed number Q independent of the total interval length A means that the lower subinterval may exceed the upper subinterval in length, even though the ideal length A(1−P) of the lower subinterval never exceeds the ideal length AP of the subinterval (P never being less than ½). In the MQ-coder the shorter subinterval (which may be the upper or the lower) is always assigned to LPS and the longer subinterval is always assigned to MPS—meaning that the LPS is decoded if the code-stream value C lies in the shorter subinterval and the MPS is decoded if the code-stream value C lies in the longer subinterval.
State variables for the MQ-decoder (those whose values are needed to describe the state of the decoding process from one decoding operation to the next are as shown in Table 1.
TABLE IAtotal interval lengthCcode value
An additional feature of the MQ-coder is integrated estimation of the value of P. For each bit-value to be encoded or decoded, an auxiliary parameter, the context is passed to the coder. This context actually contains five pieces of information: the value Q to be used to locate the split point, the predicted bit-value MPS, and new context values NMPS and NLPS which replace the current context value in the event that the encoded or decoded bit-value is MPS or LPS, respectively. Each context value is associated with a specific value of the confidence level P, but this is not stored as it plays no direct role in the coding operation. In general, the new context value indicated by NMPS keeps the same value of MPS and has a lower value of Q (corresponding to a higher value of P). Thus in response to a successful prediction (indicated by a coded bit-value of MPS) the confidence level is increased for the next prediction. Conversely, the new context value indicated by NLPS usually keeps the same value of MPS and has a higher value of Q (corresponding to a lower value of P). In a few exceptional cases corresponding to values of P just slightly higher than ½, the context value indicated by NLPS reverses the value of MPS. Thus in response to a failed prediction (Indicated by a coded bit-value of LPS) with very low confidence, the opposite bit-value is used for the next prediction.
An exception to the foregoing is that the context value is not altered if no renormalization operation takes place.
Thus in order to decode a single bit-value the following actions, or processing which achieves an equivalent result, needs to be performed:                1) The split point Csplit is calculated by obtaining the value Q corresponding to the given context;        2) The code-value C is compared to Csplit to determine which subinterval is to be chosen;        3) If the upper subinterval is chosen, then Csplit is subtracted from C to maintain the value of Cmin at zero;        4) The value of A is updated to the length of whichever subinterval is chosen;        5) The lengths of the two subintervals are compared to determine which represents MPS and which represents LPS.        6) The new value of A is compared to Amin to determine whether renormalization is required;        7) In the event renormalization is required, the context value is updated to either NMPS or NLPS.These need not necessarily be performed as 5 separate coding steps, because the comparisons entailed are not mutually independent. For example, since the value of A never exceeds 2*Amin, and since LPS always corresponds to the shorter of two subintervals, renormalization is always required in the event the coded symbol is LPS—or, conversely, if renormalization is not required, then the coded symbol must be MPS. Similarly, all values of Q are determined to be less than Amin. Since the length of the lower subinterval is Q, this means that renormalization is always required when the lower subinterval is chosen (whether or not this represents LPS).        
A known decoding algorithm 200 is shown in FIG. 2. This procedure decodes a single bit-value S. The sole input parameter is the coding context cx. On entering the procedure, the values of A and C retain their values from initialization or previous decoding operations. In either case these reflect data previously imported by the decoder from the input data stream. The values of MPS, LPS, Q, NMPS, and NLPS are precalculated and stored in look-up tables indexed by the context value cx. Amin is a fixed constant value depending on the implementation.
From start step 202, execution proceeds to step 204, wherein the decoded bit-value S is provisionally set to the MPS value for the given context cx, Q is set to the Q-value for the given context cx, and the total interval length A is provisionally set to the length A-Q of the upper subinterval. Execution then proceeds to step 206.
In step 206 the value of C is compared to Q (which also equals the value Csplit of the split point). If C equals or exceeds Q then the upper subinterval is chosen and execution proceeds to step 208. If C is less than Q then the lower subinterval is chosen and execution proceeds to step 222.
In step 208 the upper subinterval has been chosen. In step 208 the value Q (which equals the length of the lower subinterval) is subtracted from C to maintain the value of Cmin (not directly represented in the algorithm) at zero. Execution then proceeds to step 210.
In step 210 the upper subinterval has already been chosen. A was already updated to the length of this subinterval in step 204. In step 210 the value of A is compared to Amin to determine whether renormalization is necessary. If A is less than Amin, then renormalization is necessary and execution proceeds to step 214. Otherwise, renormalization is not necessary. As remarked above, this also guarantees that the coded bit-value is MPS. Since the bit-value S has previously been set to MPS in step 204, execution proceeds directly to the end 212 of the procedure.
In step 214 the upper subinterval has been chosen and it is known that renormalization is required. In step 214 the value of A (set in step 204 to the length of the upper subinterval) is compared to Q (the length of the lower subinterval) to determine which subinterval is longer and thus corresponds to the bit-value MPS. If A is less than Q, then the upper subinterval is shorter than the lower subinterval and execution proceeds to step 216. Otherwise execution proceeds to step 218.
In step 216 the upper subinterval has been chosen, it is known that renormalization is required, and the upper subinterval is shorter than the lower and hence corresponds to LPS. Since the bit-value S was provisionally set to MPS in step 204, in step 216 it is reversed by subtraction from 1. The context value cx is also updated to the context value NLPS. Execution then proceeds to the renormalization step 220.
In step 218 the upper subinterval has been chosen, it is known that renormalization is required, and the upper subinterval is not shorter than the lower and hence corresponds to MPS. Since the bit-value S was provisionally set to MPS in step 204, in step 218 it is unchanged. The context value cx is updated to the context value NMPS. Execution then proceeds to the renormalization step 220.
Step 220 performs the renormalization operation, wherein the values of A and C are successively doubled (and the rightmost bits of C are filled with new data from the input stream 221) until the value of A is not less than Amin. The details of this process are not relevant to the present invention. Execution then proceeds to the end 212 of the procedure.
In step 222 the lower subinterval has been chosen. As remarked above, in this case renormalization is always required. The value of A (provisionally set to the length of the upper subinterval in step 204) is compared to Q (the length of the lower subinterval) to determine which is longer and thus corresponds to the bit-value MPS. If A is less than Q, then the upper subinterval is shorter than the lower subinterval and execution proceeds to step 226. Otherwise execution proceeds to step 224.
In step 226 the lower subinterval has been chosen, it is known that renormalization is required, and the upper subinterval is shorter than the lower; the lower subinterval hence corresponds to MPS. Since the bit-value S was provisionally set to MPS in step 204, in step 218 it is unchanged. The context value cx is updated to the context value NMPS. Execution then proceeds to step 228.
In step 224 the lower subinterval has been chosen, it is known that renormalization is required, and the upper subinterval is not shorter than the lower; the lower subinterval hence corresponds to LPS. Since the bit-value S was provisionally set to MPS in step 204, in step 224 it is reversed by subtraction from 1. The context value cx is also updated to the context value NLPS. Execution then proceeds to step 228.
In step 228 the lower subinterval has been chosen and it is known that renormalization is required. In step 228 the total interval length A is updated to the length Q of the lower subinterval. Execution then proceeds to the renormalization step 220.
A fundamental assumption in entropy coding is that the bit-values 0 and 1 do not occur with equal frequency (otherwise, compression would be impossible). In the foregoing, therefore, the various branches of the algorithm are taken with generally unequal frequency. In particular the leftmost branch which includes steps 202-204-206-208-210-212 (in which the more probable symbol MPS is decoded) is likely to be executed with greater frequency than the others—almost 100% in the case where the confidence level P is almost 1 (and the lower subinterval length Q is almost zero). This branch of the algorithm is thus termed the common decoding path (CDP). The rest of the algorithm (steps 214-220) is termed the non-common decoding path (non-CDP).
On the other hand, in the worst-case scenario where the confidence level P is 0.5, the so-called common decoding path is executed almost never or not at all—because renormalization is required for almost every decoded bit-value. The confidence level P depends on issues outside the MQ-coder: the inherent nature of the data and the sophistication of the algorithm controlling the MQ-coder.
Many efforts in the prior art to improve on Algorithm 200 focus on streamlining the common decoding path. One known approach to providing an improved decoding algorithm which can be used in place of the method shown in FIG. 2 is the decoding method 300 shown in FIG. 3. The method 300 utilizes a variable D (termed here the base value), which is not used in the FIG. 2 decoding method. The value D is used to reduce the two comparison operations entailed in the common decoding path of FIG. 2 to one in the method 300. D is defined to be the minimum of A−Amin and C. The variables A and C then for much of the time (between decoding operations and throughout the common decoding path) do not hold their “full” values of Algorithm 200 but rather “discounted” values:(discounted)A=(full)A−D; (discounted)C=(full)C−D. State variables for this modified algorithm are shown in Table II.
TABLE IIDbase value; minimum of (full) value ofA-Amin and C.Adiscounted total interval lengthCdiscounted code value
The resulting streamlined common decoding path is illustrated in FIG. 3 as steps 302-304-306-308 which are equivalent in purpose to steps 202-204-206-208-210-212 of the method 200.
The known improved decoding method 300 will now be discussed in detail. From start step 302, execution proceeds to step 304, wherein the decoded bit-value S is provisionally set to the MPS value for the given context cx, Q is set to the Q-value for the given context cx, and Q is subtracted from D. Since at this point both A and C hold their discounted values, this single subtraction operation effectively accomplishes both the subtraction of Q from A in step 204 and the subtraction of Q from C in step 208. Execution then proceeds to step 306.
In step 306, the value of D is compared to zero. Recall that D equals the minimum of (full) A−Amin and (full) C. Thus if D is not less than zero, then neither (full) A−Amin nor (full) C is less than zero. The condition (full) A−Amin≧0 is equivalent to A≧Amin in step 210. Moreover, since effectively Q has already been subtracted from C, the condition C≧0 is equivalent to C≧Q in step 206. Thus this single comparison operation effectively performs both comparisons needed in the common decoding path, and if D≧0, then the decoded bit-value is necessarily MPS and it is known that renormalization is not required. Execution then proceeds to the end 308 of the procedure.
If D<0 in step 306, then execution proceeds to the non-common decoding path in step 310.
Introduction of the new variable D also requires modification of the non-common decoding path. The prior art includes one method 400 for implementing a non-common decoder path, step 310 of FIG. 3, which is shown in FIG. 4.
The known non-CDP method 400 for use with the method 300, begins in start step 402. From start step 402, execution proceeds to step 450. In step 450, variables A and C are restored from their discounted values to their full values by adding D to each. Execution then proceeds to step 406.
Step 406 is equivalent to step 206. In step 406, the value of C is compared to zero. Since in this version, Q has previously in effect been subtracted from C in step 304, comparing C to zero here is equivalent to comparing C to Q in step 206. This comparison indicates whether the upper or lower subinterval is chosen. If C≧0, the upper subinterval is chosen and execution proceeds to step 414; otherwise, the lower subinterval is chosen and execution proceeds to step 452.
Step 414 is directly equivalent to step 214. In step 414, A is compared to Q to determine which of the upper or lower subinterval is larger. If A is less than Q, execution proceeds to step 416; otherwise execution proceeds to step 418.
Step 416 is directly equivalent to step 216. Here it has been determined that the upper subinterval is chosen and corresponds to LPS. The bit-value S, which in step 304 was provisionally set to MPS, is converted to LPS by subtraction from 1. Also, the context value cx is updated to NLPS. Execution then proceeds to the renormalization step 420.
Step 418 is directly equivalent to step 218. Here it has been determined that the upper subinterval is chosen and corresponds to MPS. The bit-value S, which in step 304 was provisionally set to MPS, is unchanged. The context value cx is updated to NMPS. Execution then proceeds to the renormalization step 420.
Step 420 is directly equivalent to step 220. This step performs the renormalization. Execution then proceeds to step 454.
Steps 454-456-458-460 have no equivalents in the known method 200. These steps prepare the values of D, A, and C for the end of this call to the decoding procedure. In step 454 D is set to the value A−Amin. Execution then proceeds to step 456.
In step 456 the value of C is compared to that of D. If C is less than D, then execution proceeds to step 458. Otherwise, D is already set to the minimum of A−Amin and C and execution proceeds to step 460.
In step 458, the value of D is set to that of C, thus making D equal to the minimum of A−Amin and C. Execution then proceeds to step 460.
In step 460, A and C are converted from their full values to their discounted values by subtracting D. Execution then proceeds to the return step 412 of the procedure.
Step 452 has no equivalent in Algorithm 200. In step 452, the value of Q is added to C. This undoes the subtraction of Q from C that previously was effectively performed in step 304. Execution then proceeds to step 422.
Step 422 is directly equivalent to step 222. In step 422, A is compared to Q to determine which of the upper or lower subinterval is larger. If A is less than Q, execution proceeds to step 426; otherwise execution proceeds to step 424.
Step 426 is directly equivalent to step 226. Here it has been determined that the upper subinterval is chosen and corresponds to MPS. The bit-value S, which in step 304 was provisionally set to MPS, is unchanged. The context value cx is updated to NMPS. Execution then proceeds to step 428.
Step 424 is directly equivalent to step 224. Here it has been determined that the lower subinterval is chosen and corresponds to LPS. The bit-value S, which in step 304 was provisionally set to MPS, is converted to LPS by subtraction from 1. Also, the context value cx is updated to NLPS. Execution then proceeds to step 428.
Step 428 is directly equivalent to step 228. In step 428 the total interval length A is updated to the length Q of the lower subinterval. Execution then proceeds to the renormalization step 420.
This modified decoding method 300 of FIG. 3, which uses the NCP method of FIG. 4, entails a trade-off between a reduced number of operations in the common decoding path and an increased number of operations in the non-common decoding path. The actual difference in performance between this algorithm and the “standard” version 200 is largely a question of how often the common decoding path is executed versus the non-common decoding path—a question, as earlier noted, dependent on factors external to the MQ-coder. If the common decoding path is used for a large majority of bit-values decoded, then the modified version can manifest a significant speed advantage over the standard version. If the common decoding path is used less frequently, the modified algorithm may show less of a speed advantage or even be slower on average than the standard version.
While the decoding method 300 of FIG. 3 offers potential advantages over the method 200 of FIG. 2, there remains room for improvement with regard to how the non-CDP is implemented. In particular, benefits could be achieved if an improved alternative to the non-CDP method 400 could be used with the decoding method shown in FIG. 3.
To facilitate an understanding of some of the issues associated with implementing a fast decoding method using software, a brief review of some basic computer issues relating to fast software execution will now be discussed.
Instruction pipelining and parallelism. Even an ostensibly serial-execution microprocessor such as a Pentium manifests a significant degree of parallelism in instruction execution—parallelism profitably enhanced by careful code design. Execution of a single microprocessor operation such as an addition entails performance of several sub-tasks. The instruction itself must be fetched from memory, then interpreted. Perhaps one of the argument values must be again fetched from memory before the operation is actually performed. In order to execute a sequence of operations O1, O2, O3, . . . quickly, the processor “pipelines” the processing. For example, while operation O1 is actually being executed, simultaneously the argument value for operation O2 may be fetched from memory, operation O3 may be in the process of interpretation, and operation O4 may be fetched from memory. The details of this pipelining vary from one processor to another, but the basic principle is widespread.
Modern microprocessors go even further by parallelizing execution, for example perhaps simultaneously executing operations O1 and O2, fetching arguments for operations O3 and O4, interpreting operations O5 and O6, and fetching operations O7 and O8. This parallelization is subject, however, to an important constraint. Operations O1 and O2 cannot be executed simultaneously if both must modify the same register or memory location (such a pair of operations is inherently sequential in nature). Careful code design, minimizing the immediate adjacency of such pairs of non-parallelizable operations, can theoretically increase execution speed by a factor of two over the worst case.
Branch-Prediction Penalty
Pipelining of operations requires special finesse in the event that the program “branches”—transfers control to a non-contiguous location in memory. Branches are a necessary feature for program loops and conditional execution and appear frequently in any but the most straightforward and uncomplicated programs.
A branch complicates the pipelining of operations because the operation to be executed two steps after operation O1 may not be the nearby operation O3 but an operation O100 at some distant memory location. It is therefore useless for the processor to interpret operation O3 simultaneously with the execution of O1. Microprocessors have grown increasingly sophisticated in their ability to pipeline operations separated by branches, but this sophistication entails unintended and undesired consequences if not accounted for in program design.
A program branch may be conditional or unconditional. An unconditional branch is simply a transfer of control which happens identically regardless of conditions. The unconditional branch is the easiest and was the earliest to be reconciled with instruction pipelining; the instructions fed into the pipeline are simply taken from the far end of the branch rather than immediately subsequent locations in memory.
A conditional branch may or may not take place depending on processor status when the branch instruction is reached. For example, the branch may or may not be executed depending on whether the result of the immediately preceding operation was positive, or negative, or an overflow. (In a flowchart a conditional branch is recognizable as associated with each decision “diamond” symbol.) Conditional branches are difficult to reconcile with instruction to be pipelined because which of two possible locations the next instruction to be fed to the pipeline is located is impossible to predict given the information available at the time the instruction is fed.
The approach taken is to attempt to predict whether the branch will be taken based on previous history executing the same body of code. The details of the method used to make this prediction vary from one processor to another and are of little importance; the important point is that when the prediction is correct there is no interruption to the instruction pipeline. On the other hand, an incorrect prediction entails a large time penalty.
Branch prediction is of greatest benefit when applied to a branch such as the end of a loop, which may be taken many times (once for each execution of the loop) and declined only once (when exiting the loop). The large time penalty for the final, incorrect prediction is thus offset by the many correct predictions which avoided interruption of the instruction pipeline.
Branch prediction is most problematic for a conditional branch which is repeatedly both taken and declined on an unpredictable basis.
In order to support computer processing of values, standardized method of representing integer and other values using the binary values which are processed by computers have been developed. The following discussion applies to the Intel family of processors and to many other computers which internally represent numbers according to similar principles.
Most computers represent integer values in base-2 as a sequence of N bit values, were N is a number characteristic of the processor, 32 being a popular value. For purposes of illustration we consider the case N=8.
An integer value such as 57 is therefore represented as a sequence of eight bit-values:00111001which is to be interpreted as0×27+0×26+1×25+1×24+1×23+0×22+0×21+1×20=57.
An obvious difference between this representation and the set of actual integer values is that with N bit-values only 2N different representations are possible, whereas the integers are infinite in number. The consequence of this is that different integer values sometimes an share the same representation. For example, the value 256 can be written in base-2 as:1×28+0×27+0×26+0×25+0×24+0×23+0×220×21+0×20,but with only N=8 bit-values available for representation, the leftmost bit-value 1 cannot be represented; the 8-bit representation of this value thus becomes:00000000which is identical to the representation for zero. As a convention the 8-bit sequence 00000000 is almost always interpreted as zero rather than 256 or some other nonzero value.
More generally, any pair of integer values K and K+2N share the same N-bit representation. This fact is exploited for the representation of negative numbers. If a negative number K lies in the range −2N−1, −2N−1+1, −2N−1+2, . . . , −1, then K+2N lies in the range 2N−1, 2N−1+1, 2N−1+2, . . . , 2N−1. Common practice when it is desired to represent both positive and negative values is to interpret binary values in the range 2N-1, 2N−1+1, 2N−1+2, . . . , 2N−1 as negative values in the range −2N−1, −2N−1+1, −2N−1+2, . . . , −1 by subtracting 2N from each. For example, to find the representation of the value −57, one first adds 28=256:−57+256=199.The binary representation for 199:199=1×27+1×26+0×25+0×24+0×23+1×22+1×21+1×20,(or 11000111) is interpreted as representing the value −57.
An important characteristic of this system of representation is that negative value are immediately distinguishable from non-negative values in that the leftmost bit-value is 1 for all negative values and is 0 for all non-negative values.
In order to support various commonly used processing operations, general purpose processors normally include hardware to facilitate rapid execution of various commonly used arithmetic and logic operations. Operations which are normally supported in an execution efficient manner include: the right arithmetic shift (denoted here SAR); bit-wise XOR operation, the right logical shift (SLR), and logical AND operations.
The right arithmetic shift (denoted here SAR) takes two input arguments: a first argument comprising an integer value with an N-bit representation and a second argument R comprising a non-negative integer value. The bit-value of the result at location J bits from the left equals the bit-value of the first input argument J-R positions from the left. If (J-R) is less than zero, then the output bit-value equals the leftmost bit-value of the first input argument.
For example, if N=8 and the first input argument has binary representation 11100111, then:SAR(11100111,0)=11100111;SAR(11100111,1)=11110011;SAR(11100111,2)=11111001;SAR(11100111,3)=11111100;SAR(11100111,4)=11111110;SAR(11100111,5)=11111111;SAR(11100111,6)=11111111;SAR(11100111,7)=11111111;SAR(01100111,R)=11111111 for all R>7.
A particular special case of interest is SAR(X,N-1), in which all bit-values are identically 0 or 1, depending on whether the input argument X is negative (1 in the leftmost bit-position) or non-negative (0 in the leftmost bit-position).
The bit-wise AND operation takes as input two arguments, each with an N-bit representation. The bit-value of the result at location J bits from the left equals 1 only if the corresponding bit-values of both input arguments at location J bits from the left equal 1 and equals 0 otherwise. For example:11100111 AND 01010101=01000101;11100111 AND 11110000=11100000;11100111 AND 00001111=00000111;11100111 AND 11111111=11100111;11100111 AND 00000000=00000000.Two particular special cases of interest are when the second input argument has binary representation all zeros—in which case the result is likewise all zeros—and when the second input argument has binary representation all ones—in which case the result matches the first input argument exactly.
The bit-wise XOR operation takes as input two arguments, each with an N-bit representation. The bit-value of the result at location J bits from the left equals 1 if the two corresponding bit-values of both input arguments at location J bits from the left differ in value and equals 0 otherwise. For example:11100111 XOR 01010101=10110010;11100111 XOR 11110000=00010111;11100111 XOR 00001111=11101000;11100111 XOR 11111111=00011000;11100111 XOR 00000000=11100111.
The right logical shift (SLR) differs from the right arithmetic shift in only one respect. It takes two input arguments: a first argument comprising an integer value with an N-bit representation and a second argument R comprising a non-negative integer value. The bit-value of the result at location J bits from the left equals the bit-value of the first input argument J-R positions from the left. If (J-R) is less than zero, then the output bit-value equals zero (whereas for the right arithmetic shift it equals the leftmost bit-value of the first input argument).
For example, if N=8 and the first input argument has binary representation 11100111, then:SLR(11100111,0)=11100111;SLR(11100111,1)=01110011;SLR(11100111,2)=00111001;SLR(11100111,3)=00011100;SLR(11100111,4)=00001110;SLR(11100111,5)=00000111;SLR(11100111,6)=00000011;SLR(11100111,7)=00000001;SLR(01100111,R)=00000000 for all R>7.
A particular special case of interest is SLR(X,N-1), which takes the value 1 or 0, depending on whether the input argument X is negative (1 in the leftmost bit-position) or non-negative (0 in the leftmost bit-position).
While the benefits of branch avoidance for the design of fast software is well known, in practice the it has proven difficult to avoid the use of branches in software implementations of decoding and other types of complicated processes. For example, the above described known decoding method uses branches in the common decoding path as part of the known decoding process. Such branches can slow computer implementation of the known decoding method in the case where branch predictions turn out to be wrong.
In view of the above discussion, it should be apparent that there is a need for improved decoding methods which can be used to facilitate decoding of JPEG 2000 encoding image data and/or other image data which requires decoding processing similar to that required to decoded JPEG 2000 image data. From an implementation standpoint, it is desirable that any new methods avoid or minimize the use of branches during decoding without introducing unnecessary deals or other processing requirements that significantly lessen any implementations advantages gained by reducing or avoiding branches during processing. Furthermore, from a customer acceptance perspective, it is desirable that any new decoding methods or procedures be compatible with existing general purpose processors such as Pentium and similar computers to avoid the need for customers to purchase special hardware or new processors.
As will be discussed below, a novel non-common decoding path method of the present invention can be used in combination with the method 300, in place of the known method 400, to provide a novel MQ-decoding method which is superior to the combination of methods 300 and 400.