1. Field of the Invention
This invention relates to the error resilience properties of image coding, and in particular the JPEG2000 image coding algorithm and more specifically to methods for improving error resilience performance in digital images, JPEG2000 images and image video sequences.
2. Description of the Related Art
As the applications utilizing digital imagery continue to expand, the amount of digital imagery produced and processed grows rapidly. One of the key technologies fueling this growth is image compression. Since the amount of data in images can be quite large, images are almost never transmitted or stored without compression. Image compression aims to represent an image with as few bits as possible while preserving the desired level of quality. Compression can be used to reduce the channel bandwidth needed for transmission or the storage requirements of the image.
Standardization of image compression techniques facilitates easy exchange of images and significantly reduces the cost of specialized hardware and software used in image compression systems. It would be extremely difficult to browse the Internet if each web site used its own image compression technique. Different image compression software would need to be downloaded and installed each time you visit a new site.
Arguably, the most successful image compression standard has been the JPEG (Joint Photographic Experts Group) standard developed by a joint committee of the International Standards Organization (ISO) and the International Electrotechnical Committee (IEC). A large fraction of the images that are published on the Internet have been compressed using JPEG.
Since the original JPEG standardization effort, there have been many advances in image compression technology. Several new image compression techniques that offer not only improved compression performance but also enhanced functionality have been introduced. Furthermore, ISO/IEC joint committee has identified several areas where current standards fail to address or produce satisfactory results. Thus, in 1996, the development of a new standard was initiated. This new standardization effort was scheduled to produce an International Standard in the year 2000; hence the standard was named JPEG2000. See ISO/IEC 15444-1, “JPEG2000 Image Coding System,” 2000 and D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Practice and Standards, Kluwer Academic Publishers, Massachusetts, 2002.
Besides providing state-of-the-art compression performance, JPEG2000 offers a number of functionalities that address the requirements of emerging imaging applications. JPEG2000 creates a framework where the image compression system acts more like an image processing system than a simple input-output storage filter. The decision on several key compression parameters such as quality or resolution can be delayed until after the creation of the codestream, and several different image products may be extracted from a single codestream. This new framework enhances the efficacy of existing imaging applications and enables emerging ones.
JPEG2000 utilizes a context-based arithmetic bitplane coder for entropy coding. The operation of this coder is highly dependent on the state of the system, and it is crucial to maintain synchronization between the encoder and the decoder. A single bit error in the arithmetically coded segments of the bitstream can destroy this synchronization, and could result in erroneous decompression.
Motion JPEG2000 is a part of the JPEG2000 standard for handling video. Motion JPEG2000 itself defines a file format that contains one or more motion sequences of JPEG2000 images, audio, timing information and metadata. Motion JPEG2000 uses no inter-frame coding. Each image in the video sequence is coded using JPEG2000. Other video formats may be used to handle sequences of JPEG2000 images.
Overview of JPEG2000
JPEG2000 is the new international image compression standard (ISO/IEC 15444-1, “JPEG2000 Image Coding System,” 2000). JPEG2000 offers state-of-the-art compression performance as well as improved functionality over previous compression standards. In particular, the error resilience properties of the JPEG2000 standard offer several mechanisms to combat errors in the codestream. Although the JPEG2000 standard specifies only the decoder and codestream syntax, a representative encoder is described to enable a more readable explanation of the algorithm and comprehension of the error resilience properties. A more balanced review can be found in M. W. Marcellin, M. J. Gormish, A. Bilgin and M. P. Boliek, “An overview of JPEG-2000”, Data Compression Conference, pp. 523-541, March 2000, Snowbird, Utah. For a comprehensive overview of JPEG2000, see D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Practice and Standards, Kluwer Academic Publishers, Massachusetts, 2002.
The basic block diagram of a JPEG2000 encoder 10 is illustrated in FIG. 1. Here, the input image 12 is first divided into non-overlapping rectangular tiles 14. If the image has multiple components, an optional component transform 16 can be applied to decorrelate the components. The samples of each component that fall into a particular tile are referred to as a tile-component. Each tile-component is then transformed using a wavelet transform 18 and the wavelet subbands are partitioned into several different geometric structures, as illustrated in FIG. 2. These geometric structures are instrumental in enabling low memory implementations and providing spatial random access, and contribute to the error resilience of the codestream.
The smallest geometric structure in JPEG2000 is the codeblock. Codeblocks are formed by partitioning the wavelet subbands. As illustrated in FIG. 2, the codeblocks 28 of particular resolutions are grouped together to form precincts 30a, 30b, and 30c (lowest to highest resolution). Once the wavelet subbands are quantized 20, each codeblock is compressed individually using a bitplane coder 22. The bitplane coder makes three passes over each bitplane of a codeblock. These passes are referred to as coding passes or subbitplanes. The compressed data from each codeblock can be regarded as an embedded codeblock bitstream 24. The JPEG 2000 codestream 26 is then generated 27 to comprise a different number of coding passes from each individual codeblock bitstream 24, based on any desired criteria.
Bitplane Coding
As mentioned in the previous section, entropy coding is performed independently on each codeblock. This coding is carried out as context-dependent, binary, arithmetic coding of bitplanes. The arithmetic coder employed is the MQ-coder as specified in the JBIG-2 standard (ISO/IEC 14492, “Information Technology—Lossy/Lossless Coding of Bi-Level Images, 1999).
Consider a quantized codeblock to be an array of integers in sign-magnitude representation. Let q[n] denote the quantization index at location n=[n1,n2] of the codeblock. Let
            χ      ⁡              [        n        ]              ⁢          =      Δ        ⁢                            sign                ⁢                  (                      q            ⁡                          [              n              ]                                )                ⁢                                  ⁢                  and                ⁢                                  ⁢                  v          ⁡                      [            n            ]                              ⁢              =        Δ            ⁢                                q          ⁡                      [            n            ]                                        ⁢        denote the sign and magnitude arrays, respectively. χ[n] is indeterminate when q[n]=0. However, this case will not cause any problems, since the sign need not be coded when q[n]=0. Then, consider a sequence of binary arrays with one bit from each coefficient. These binary arrays are referred to as bitplanes. One such array can store the signs of each coefficient, e.g. sign plane 32, as illustrated in FIG. 3a. Let the number of magnitude bitplanes for the current subband be denoted by Kmax. The topmost magnitude bitplane 34 contains the most significant bit (MSB) of all the magnitudes. The next bitplane contains the next MSB of all the magnitudes, continuing in this fashion until the final bitplane 36 that consists of the LSBs of all the magnitudes.
Let
            v              (        p        )              ⁡          [      n      ]        ⁢      =    Δ    ⁢      ⌊                  v        ⁡                  [          n          ]                    /              2        p              ⌋  denote the value generated by dropping p LSBs from v[n]. The “significance” of a sample at bitplane p is then defined by
            σ              (        p        )              ⁡          [      n      ]        =      {                            1                                                                    v                                  (                  p                  )                                            ⁡                              [                n                ]                                      >            0                                                0                                                                    v                                  (                  p                  )                                            ⁡                              [                n                ]                                      =            0                              
Thus, a sample at location n is said to be significant with respect to bitplane p, if σ(p)[n]=1. Otherwise, the sample is considered to be insignificant. For a given codeblock in the subband, the encoder first determines an appropriate number of bitplanes, K≦Kmax, that are required to represent the samples in the codeblock. Typically, K is selected as the smallest integer such that v[n]<2K for all n. The number of bitplanes (starting from the MSB) that are identically zero, Kmsbs=Kmax−K, is signaled to the decoder as side information. The Kmax of each subband is also available at the decoder. Then, starting from bitplane K−1, each bitplane is encoded in three passes (referred to as coding passes).
The scan pattern 38 followed for the coding of bitplanes, within each codeblock (in all subbands), is shown in FIG. 3b. This scan pattern is followed in each of the three coding passes. The decision as to which pass a given bit is coded in is made based on the significance of that bit's location and the significance of neighboring locations. All bitplane coding is done using context dependent binary arithmetic coding with the exception that run coding is sometimes employed in the third pass. Let us define σ[n] as the “significance state” of a sample during the coding process. σ[n] is set to zero for all samples at the start of the coding process and it is reset to one as soon as the first non-zero magnitude bit of the sample is coded.
The first pass in a new bitplane is called the significance propagation pass. A bit is coded in this pass if its location is not significant, but at least one of its eight-connected neighbors is significant. In other words, a bit at location n is coded in significance propagation pass, if σ[n]=0 and
            ∑                        k          1                =                  -          1                    1        ⁢                  ⁢                  ∑                              k            2                    =                      -            1                          1            ⁢                          ⁢              σ        ⁡                  [                                                    n                1                            +                              k                1                                      ,                                          n                2                            +                              k                2                                              ]                      ≥  1If a bit is coded and the bit that is coded is 1, the sign bit of the current sample is coded and σ[n]=0 is set to 1 immediately.
In JPEG2000, the probability estimate that is used to drive the arithmetic coder is selected depending on the context of the current bit. Furthermore, JPEG2000 employs different context models depending on the coding pass and the subband type. For significance coding, 9 different contexts are used. The context label κsig[n] is dependent on the significance states 40 of the eight-connected neighbors 42 of the current bit 44 as is illustrated in FIG. 4. Using these neighboring significance states, κsig[n] is formed from three quantities
                    κ        h            ⁡              [        n        ]              =                  σ        ⁡                  [                                    n              1                        ,                                          n                2                            -              1                                ]                    +              σ        ⁡                  [                                    n              1                        ,                                          n                2                            +              1                                ]                                        κ        v            ⁡              [        n        ]              =                  σ        ⁡                  [                                                    n                1                            -              1                        ,                          n              2                                ]                    +              σ        ⁡                  [                                                    n                1                            +              1                        ,                          n              2                                ]                                        κ        d            ⁡              [        n        ]              =                  ∑                              k            1                    ⁢          ε          ⁢                      {                                          -                1                            ,              1                        }                                                        ⁢                          ⁢                        ∑                                    k              2                        ⁢            ε            ⁢                          {                                                -                  1                                ,                1                            }                                                                      ⁢                  σ          ⁡                      [                                                            n                  1                                +                                  k                  1                                            ,                                                n                  2                                +                                  k                  2                                                      ]                              Table 1 shows how κsig[n] is generated given κh[n], κv[n], and κd[n]. κsig[n] then determines the probability estimate that will be used in arithmetic coding.
TABLE 1Context Labels for Significance Coding.HHLL and LHHLCodeblocksCodeblocksCodeblocksκh[n] +κsig[n]κh[n]κv[n]κd[n]κh[n]κv[n]κd[n]κd[n]κv[n]82———2—≧3—71≧1—≧11—2≧1610≧101≧12051000101≧2402—20—11301—10—10200≧200≧20≧2100100101000000000“—denotes don't care.”
As stated earlier, the sign bit of a sample is coded immediately after its first non-zero bit. Similar to significance coding, sign coding also employs context-modeling techniques. JPEG2000 uses 5 contexts for sign coding. The context label κsign[n] is selected depending on the four-connected neighbors of the current sample. The neighborhood information is incorporated using the intermediate quantitiesχh[n]=χ[n1,n2−1]σ[n1,n2−1]+χ[n1,n2+1]σ[n1,n2+1]χv[n]=χ[n1−1,n2]σ[n1−1,n2]+χ[n1+1,n2]σ[n1+1,n2].
χh[n] and χv[n] are then truncated to the range −1 through 1 to form χh[n]=sign(χh[n])min{1,|χh[n]|} χv[n]=sign(χv[n])min{1,|χv[n]|}.
The sign-coding context label κsign[n], and the sign-flipping factor, χflip[n], are then given in Table 2. The binary symbol, s, that is arithmetic coded is defined as
TABLE 2  s  =      {                            0                                                                    χ                ⁡                                  [                  n                  ]                                            ⁢                                                χ                  flip                                ⁡                                  [                  n                  ]                                                      =            1                                                1                                                                    χ                ⁡                                  [                  n                  ]                                            ⁢                                                χ                  flip                                ⁡                                  [                  n                  ]                                                      =                          -              1                                           Context Labels for Sign Coding. χh[n] χv[n]κsign[n]χflip[n]11141101311−112101111001010−111−1−1112−1−1013−1−1−114−1
The second pass is the magnitude refinement pass. In this pass, all bits from locations that became significant in a previous bitplane are coded. As the name Implies, this pass refines the magnitudes of the samples that were already significant in previous bitplanes. The magnitude refinement context label, κmag[n], is selected as described in Table 3. In the table, σ[n] denotes the value of the significance state variable σ[n] delayed by one bitplane. In other worlds, similar to σ[n], σ[n] is initialized to zero at the beginning and set to one only after the first magnitude refinement bit of the sample has been coded.
TABLE 3Context Labels for Magnitude Refinement Coding. σ[n]κsig[n]κmag[n]00150>0161—17“—denotes “don't care.”
The third and final pass is the clean-up pass, which takes care of any bits not coded in the first two passes. By definition, this pass is a “significance” pass, so significance coding, as described in significance propagation pass, is used to code the samples in this pass. Unlike the significance propagation pass, however, a run-coding mode may also occur in this pass. Run coding occurs when all four locations in a column of the scan 39 (see FIG. 3b) are insignificant and each has only insignificant neighbors. A single bit is then coded to indicate whether the column is identically zero or not. If not, the length of the zero run (0 to 3) is coded, reverting to the “normal” bit-by-bit coding for the location immediately following the 1 that terminated the zero run.
Anatomy of JPEG2000 Codestreams
The structure of a JPEG2000 codestream is illustrated in FIG. 5. For the purposes of forming the codestream, compressed data from each precinct are arranged to form packets 46. There can be multiple packets for each precinct, corresponding to multiple “quality layers.” Each packet contains zero or more coding passes from each codeblock in the precinct. In this way, coding passes from codeblocks can be partitioned across multiple packets. Packets play an important role in the organization of data within a JPEG2000 codestream. Each packet contains a header 48 and a body 50. The packet header contains information about the contribution of each codeblock in the precinct into the packet, and the body contains coding passes of codeblocks. Packets that belong to a particular tile are grouped together to form a tile-stream 52, and tile-streams are grouped together to form the JPEG2000 codestream 54. Similar to packets, tile-streams are comprised of a header 56 and a body 58. It is possible to break a tile-stream into multiple tile-parts. In this case, the first tile-part contains a tile header and the remaining tile-parts contain tile-part headers. There is also a main header 60 at the beginning of the codestream. The EOC 62 marker denotes the end of the codestream.
Error Resilience
Entropy coding in JPEG2000 is achieved using a context-based arithmetic bitplane coder. The operation of this coder is highly dependent on the state of the system, and it is crucial to maintain synchronization between the encoder and the decoder. A single bit error in the arithmetically coded segments of the bitstream can destroy this synchronization, and could result in erroneous decompression. To combat this problem, several error resilience tools are provided within JPEG2000.
The partitioning of the codestream into different partition sets is the first line of defense. In terms of error resilience, this partitioning aims to isolate errors made in one partition set to that particular partition set, and prevent error propagation across partition set boundaries. This isolation occurs at several levels, since the codestream is organized in a hierarchical fashion. Each codestream starts with a main header. The main header contains critical information such as image and codeblock sizes, and is essential for correct decompression of the codestream. If sections of the main header are unavailable at the decoder, the decoder may not be able to decode the codestream. Similarly, if a tile-stream header is lost during transmission, the decoder might be unable to determine several parameters that will be required for correct decompression of that tile. This might result in discarding the entire tile. A similar scenario would occur, if tile-parts were utilized and the first tile-part header was lost. However, if the header of any of the remaining tile-parts were lost, the decoder would be able to decompress the earlier tile-parts and the remaining tile-parts that belong to the current tile might need to be discarded. If a packet header is lost, all of the data in the current and future packets that correspond to the corresponding precinct will have to be discarded. Since codeblocks are coded independently, errors do not propagate between codeblocks. Since coding passes from one codeblock may appear in multiple packets, errors may propagate between packets, even though packet headers are not corrupt.
JPEG2000 provides a mechanism where the packet headers can be extracted from every packet and stored in tile-part headers or the main header. This is referred to as packed packet headers. Packed packet headers can provide significant advantages for error resilience if the main and tile-part headers can be transmitted in a lossless fashion. Since the packet headers contain the lengths in bytes of all coding passes in the packet, the decoder can utilize this information to isolate errors.
Error Detection and Resynchronization
To complement the hierarchical data partitioning, JPEG2000 provides several mechanisms for error detection and resynchronization. One such mechanism is the byte-stuffing procedure in JPEG2000. Through the use of this byte-stuffing procedure, the JPEG2000 arithmetic coder does not produce certain values (0xFF90 through 0xFFFF) inside coding passes. These values, called delimiting marker codes, are reserved for codestream markers. Unexpected detection of one of these values would indicate that an error has occurred.
Some of the error detection and resynchronization mechanisms are enabled by mode variations. These alternatives to the default mode of the codec allow additional capabilities in exchange for some small decrease in compression efficiency. Mode variations are controlled by flags that are signaled inside headers. These mode switches are listed in Table 4, and are not all intended to enhance error resilience.
TABLE 4Mode variations.SwitchDescriptionBYPASSSelective MQ coder bypassRESETReset context statesRESTARTTerminate and restart MQ coderCAUSALStripe-causal context informationERTERMPredictable terminationSEGMARKSegmentation marker
When the RESET mode is used, the context states (i.e. the probabilities used in arithmetic coding) are reset to their initial values at the end of each coding pass. If the RESET switch is not specified, this initialization occurs only prior to the first coding pass of a codeblock. Although this mode reduces the compression efficiency a little, it is useful in parallel processing applications.
The RESTART switch causes the MQ coder to be restarted at the beginning of each coding pass. Thus, when this mode is utilized, every coding pass has its own MQ codeword segment. The length of each of these segments is signaled in the packet header. Thus this mode reduces the compression efficiency and increases the amount of overhead. However, the RESTART mode is very useful for error resilience, since it increases the decoder's data recovery abilities.
The goal of the BYPASS mode is to provide reduced complexity at high rates with little loss in compression efficiency. When this mode is selected, the MQ coder is bypassed during the first (significance propagation) and second (magnitude refinement) coding passes, when p<K−4. In other words, this mode is activated only after the tenth coding pass of each codeblock. Since after the tenth coding pass, the symbols from the first two coding passes do not exhibit highly skewed probabilities for most images, this mode does not usually result in considerable reduction in compression efficiency. When the MQ coder is bypassed, the binary symbols are stored in raw segments. To avoid the appearance of the delimiting marker codes inside these raw segments, a simple bit-stuffing procedure is adopted. Insertion of raw segments into the bitstream requires the previous MQ coded segment to be terminated. The raw segments need to be terminated prior to the start of an MQ segment (clean-up coding pass) as well. If RESTART mode is selected as well, termination occurs at the end of each coding pass. The length of every terminated segment should be signaled inside the relevant packet header.
Another mode that is defined by the standard is the SEGMARK mode. If this mode is enabled, a four-symbol code (“1010”) is inserted at the end of the third coding pass of each bitplane. Since a bit error in any of the coding passes is likely to corrupt at least one of these symbols, the decoder can detect that an error has occurred, and discard the erroneous coding passes.
The standard does not define a particular codeword termination policy that needs to be adopted by the encoder. In general, the encoder is free to terminate the codewords in any manner that will result in correct decoding. However, when the ERTERM mode is utilized, the encoder adopts a predictable termination policy for each MQ and/or raw codeword segment. Thus, the decoder can detect that an error has occur-red in a particular coding pass.
The CAUSAL mode was defined by the standard to allow parallel processing of coding passes. When the CAUSAL mode is utilized, the context formation process is modified slightly. Recall that the scan pattern of samples within a codeblock, as shown in FIG. 3b, was a 4-sample stripe-oriented scan. In CAUSAL mode, the samples within a current stripe 64 are encoded without depending on the values of future stripes 66. Thus, when the contexts are formed, all samples from future stripes are treated as insignificant. To clarify this, assume that the sample 67 at location n is being encoded, as illustrated in FIG. 6. As seen in the figure, assume that this sample belongs to the fourth row of a stripe. If the CAUSAL mode is utilized, the significance states σ[n1+1, n2+k2] are considered to be zero, for k2=−1,0,1.
Error Resilience Properties of JPEG2000
The transmission of images over lossy communication channels such as the Internet or wireless networks is a rapidly expanding market that offers additional technical challenges. When compressed images are transmitted over such channels, the compressed codestreams received at the decoder can contain errors. The decoder should be able to reconstruct images using these noisy codestreams. Many image compression methods rely on efficient entropy coders that are dependent on the state of the system. Such coders are very sensitive to errors in the codestream. The JPEG2000 standard does not define the behavior of the decoder when an error in the codestream is detected.
The operation of high performance image codecs over lossy communication channels requires careful design to avoid complete failure when the codestream gets corrupted. The design is heavily dependent on the loss characteristics of the channel. For example, suppose that the channel introduces erasures such that some of the transmitted data is not received by the decoder. Such a model is actually used to represent the transmission characteristics of the Internet. Assuming that a feedback channel is available, one approach could be to request the retransmission of lost data from the sender. This is how reliable communication is achieved over the Internet using the TCP/IP protocol. If the user receives a corrupted image (or data packet), the user discards the data and sends a message to the server that the data was corrupted. The server resends the data until it is received without error. The upside to this approach is that the image is eventually received error free. However, there are many downsides. First of all, this approach requires a feedback channel. In some cases, such a channel may not exist, or the decoder might be unwilling to contact the sender due to security concerns. If the channel experiences a lot of losses, the retransmission requests will further flood the channel and the effective bandwidth of the channel will be considerably lower. Furthermore, this approach introduces delay and may not be applicable for real-time applications.
The current practice with JPEG2000 images is to discard the current and future coding passes for a codeblock, when an error is detected in a particular coding pass. As explained earlier, the bitplanes of each codeblock are compressed starting from the MSB to the LSB. Three coding passes are performed for each codeblock. If the decoder detects an error in a particular coding pass, bitplanes reconstructed from earlier coding passes are kept and the current and future coding passes belonging to the current codeblock are discarded. Other codeblocks are not affected. This results in poor quality in the reconstructed image for the region affected by the codeblock. This practice is adopted in all existing software implementations including ISO/IEC JTC1/SC29/WG1 N2165, “JPEG2000 Verification Model 9.1 (Technical Description),” 2001; D. Taubman, “Kakadu: A comprehensive, heavily optimized, fully compliant software toolkit for JPEG2000 developers,” http://www.kakadusoftware.com; M. D. Adams, “JasPer,” http://www.ece.ubc.ca/˜mdadams/jasper; and “JJ2000: An implementation of JPEG2000 standard in Java,” http://jj2000.epfl.ch, and no alternative to this approach is present in literature.