Statistical coding is an example of a coding technique that is a useful tool for storing and transmitting large amounts of data. For example, the time required to transmit an image, such as a facsimile transmission of a document, is reduced when compression is used to decrease the number of bits required to recreate the image.
In statistical coding, one develops a probability estimate from conditioning data. Typically, the set of all coding decisions is partitioned based on the conditioning data and statistics are developed for each such partition member, or context. For example, coding decisions might be conditioned on neighboring pixels. An example of such a scheme is shown in FIG. 1A and 1B. Referring to FIG. 1A, 10 prior pixels are used to condition pixel 101 (identified by the "?"). If each pixel has two possible values, there are 1024 different combinations that are possible for the 10 pixels. Thus, there are 1024 different contexts that can be used to condition the coding of pixel 101. Referring to FIG. 1B, a lookup table (LUT) 102 contains a probability estimate of how likely a pixel is to be in its most probable state (or least probable state) for each possible combination of values for the 10 pixels. Therefore, LUT 102 contains 1024 context bins, each context bin having a probability estimate associated with it. LUT 102 receives the 10 prior pixels at its input 103 and outputs probability estimate 104 in response thereto.
Two or more partitionings may be made to obtain two or more probability estimates for the same pixel. For example, one estimate might correspond to pixels neighboring towards the left of the pixel being conditioned and another estimate might correspond to pixels neighboring above the pixel being conditioned. The two estimates are combined into a single probability estimate (i.e., a collaborative judgment). FIG. 2 is a block diagram illustrating such a system.
Referring to FIG. 2, two probability estimates are conditioned on two different groups of 8 pixels. One of the probability estimates, P.sub.1, indicating the probability that pixel 200 is in its most probable state, is output from LUT 201 based on the values of the 8 pixels shown in pixel group 203. The other probability estimate, P.sub.2, indicating the probability that pixel 200 is in its most probable state, is output from LUT 202 based on the values of the 8 pixels shown in pixel group 204. Because pixel groups 203 and 204 have 5 common pixels, 5 of the inputs to LUTs 201 and 202 are the same and are shared. The two probability estimates are combined by combining block 205 to generate a collaborative probability estimate 206.
The main advantage of using a collaborative judgment is that smaller conditioning partitions can be used. The use of smaller conditioning partitions implies faster adaptation because contexts are not "diluted" and the model is not "overfitted" with too many contexts. Smaller partitions also imply less hardware to store the probability estimates. In other words, the memory for storing the statistics and context (i.e., the context memory) may be smaller.
In the case where two probability estimates are being combined, one might assume that an average of the two probability estimates would achieve the most accurate probability estimate. However, this simple combining technique is often only applicable when the conditioning upon which one of probability estimates is based is strongly related to the conditioning upon which the other probability estimate is based.
In the prior art, estimates are combined via the geometric mean of the odds and the result is exaggerated if there is a consensus. See Kris Popat and Rosalind W. Picard, "Exaggerated Consensus in Lossless Image Compression," In Proceedings of IEEE Int'l Conf of Image Processing, Vol. III, pgs. 846-850, November 1994. That is, when the probability estimates are both greater than 50% (i.e., there is a consensus) and the resulting combined probability estimate is greater than either of the two component probability estimates, the resulting probability estimate is said to be exaggerated.
In Popat-Picard, the collaborative judgment, referred to as P.sub.NET, is calculated based on raw probability, referred to as Q, according to the following: EQU Q=.sqroot.A/(1+.sqroot.A)
where A equals: EQU A=P.sub.1 .multidot.P.sub.2 /(1-P.sub.1 -P.sub.2 +P.sub.1 P.sub.2)
P.sub.NET is generated according to the following equation: EQU P.sub.NET =Q.sup.Y(B) /(Q.sup.Y(B) +(1-Q).sup.Y(B))
where P.sub.1 and P.sub.2 are the two probability estimates and B measures the divergence between probability estimates P.sub.1 and P.sub.2. The value Y(B) is an exaggeration exponent implemented with a table-lookup developed from training images. The function mapping consensus to an exaggeration exponent is set by training using an ensemble of test data.