As more use is made of digital images and digital images become larger in size, the ability to communicate portions of an image is essential to efficient and rapid communication. Indeed the Joint Photographic Experts Group (JPEG) recognized the importance of various access features in the design of the JPEG 2000 image compression standard. JPEG 2000 allows the location in the codestream of relevant portions of an image by reading header information in the JPEG 2000 file and thus provides efficient access to local files. However, much image communication takes place across a network of some sort and many different clients may access portions of the same imagery. Thus it is important to have a standard protocol for communicating portions of a compressed image across a network. The JPEG international standards committee is currently developing “JPEG 2000 Image Coding System—Part 9: Interactivity Tools, APIs and Protocols,” also called JPIP, which will eventually become a standard for communication of imagery across a network.
JPEG 2000 Background
Some important ideas from JPEG 2000 are shown in FIG. 3, which shows some portions of a JPEG 2000 system. An original image may be divided into regular tiles. These tiles are compressed independently. Thus, the coded data from the upper left tile is sufficient to reconstruct the upper left part of the image. Likewise any region can be reconstructed by obtaining the data from all the tiles that intersect that region and decoding them.
The JPEG 2000 encoder performs a two dimensional wavelet transform on each tile. This leads to a set of wavelet coefficients grouped in subbands. One of the subbands is a low resolution version of the image—it is exactly the same image as would be obtained by operating a particular low pass filter on the entire image and subsampling. Importantly, that subband can be combined with other subbands to form intermediate resolution versions of the image. Thus, a JPEG 2000 codestream provides access to images at low resolution, and resolutions larger by a factor of 2 horizontally and vertically, up to the full resolution of the image.
The wavelet transform coefficients are regrouped into “precincts.” These precincts form an alternative mechanism to access spatial regions of an image. To decode the upper left part of an image, the coded data for the precincts in the upper left part of each subband can be obtained and decoded. Because a wavelet transform coefficient affects multiple pixels in the image, and coefficients in different subbands affect different numbers of pixels, and each pixel is affected by multiple coefficients, it is necessary to carefully determine all the precincts that might be necessary to decode a particular region of the image.
Tiles and precincts can both be used when compressing an image. If both are used, the tile provides a coarse grain access to regions of the image, while the precincts provide access to more localized regions.
Finally, precincts are divided into code-blocks and these code-blocks of wavelet coefficients are compressed with multiple passes, starting with the most significant bit and proceeding to the less significant bits. These “coding-passes” remove statistical redundancy in a lossless manner. For example, large numbers of zero bits are replaced by much shorter codes. The fixed sized block of wavelet coefficients thus becomes a variable length segment of coded data. The first part of the segment is the most important part because it corresponds to the most significant bits of the wavelet coefficients. If the later parts of the coded data segment are not available to a decoder, an image can still be obtained; it is just of lower quality.
A “codestream” is defined in JPEG 2000 Part 1. It consists of a series of “marker segments” which are essentially two byte codes identifying the next portion of data in the codestream, followed by the data. There is a “main header” that starts with “SOC” and “SIZ” marker segments. The SIZ marker segment contains information about the width and height of the image components. The “COD” and “COC” marker segments contain parameters describing how the compressed data should be decoded. After the main header, there are a series of “tile-parts.” Each tile-part begins with a “SOT” marker segment that identifies the particular tile and part. The coded data for each tile-part is preceded by a “SOD” marker segment. The codestream contains all the entropy coded data for the image, and data describing the method which should be used to decode the coded data. The codestream contains information about the wavelet transform used, the size of tiles, precinct sizes, information about the number of resolutions, the order of packets in the file, etc. The codestream must contain all the parameters needed to decode the entropy coded data into image samples. The codestream may also contain information to provide rapid access to portions of the coded data, e.g. lengths of the packets.
A “file-format” is a wrapper for one or more codestreams. JPEG 2000 Part 1 defines a simple “JP2” file format. JPEG 2000 Part 2 includes a definition of the “JPX” format to store more complex information. JPEG 2000 Part 3 defines the “MJ2” file format for motion JPEG 2000. JPEG 2000 Part 6 defines the “JPM” file format for compound documents. There are also file-formats not defined by the JPEG 2000 committee that could contain JPEG 2000 codestreams. For example, the DICOM file format is extensively used in the medical community. TIFF and PDF are file formats that already allow multiple compression types and could easily be extend to allow JPEG 2000 codestreams. File-formats typically provide “meta-data” associated with the image data contained in the codestream e.g. audio, XML information, and image capture conditions. The file format also commonly contains information about the color space of the decoded image data.
JPIP Background
Requests
JPIP defines several parameters that can be used by a client in making a request. These requests indicate what sub-image the client is interested in. The requests may also include information about the compressed data already received by a client so the server will not repeat the data. The requests also may provide information about what meta-data the client is interested in and information to control the type of response given by the server. Some of these parameters include frame-size, region offset, and region size.
The frame-size parameter, which appears as “fsiz=128,128” in several examples, indicates the size of the entire image the client wishes to use to access a region. If there is no region size parameter (see below), the frame-size is simply the size of the image the client wants. For example, an image which is 512 by 512 samples and encoded with 4 levels of wavelet transform can be accessed with a frame size of 512 by 512, 256 by 256, 128 by 128, 64 by 64, or 32 by 32. In the first request to the server, the client may not know the actual frame-sizes available. In this case, the client can indicate a desired size and a rounding method, and the server will return the closest available size (and the other parameters will be scaled to match the actual frame size used).
The region offset parameter, which might be used as “roff=64,64”, indicates that the client is not interested in the entire image, but only in a region beginning at the offset specified by the two values given in the request. The offset is relative to the request frame-size.
The region parameter, which might be used as “rsiz=64,64”, indicates the size of the region desired by the client. Instead of providing the three parameters fsiz, rsiz, and roff, a client may indicate interest in file- or server-defined regions of interest with the “roi” parameter. The roi parameter can be used to request a named region identified in one of the file format boxes. The value may be set to “dynamic” as in “roi=dynamic” to indicate that the server should choose the appropriate region based on any knowledge of the client.
Session Management
JPIP can operate without any memory in the server. To be efficient in this case, the client needs to provide a list of relevant received data with each query. This can be done with the “model”, “tpmodel”, or “needs” parameters. To begin a session, a client might issue a request for a channel-id with “cid=0”. The server response indicates a value that will be used for the channel. All future requests using the same channel will be assumed to have the same cache. Multiple sub-images of the same image can be requested and received in parallel by using different channels.
Responses
JPIP responses can take three main forms: complete image file, tile-part stream, or precinct stream. The complete image file return type is essentially like returning a custom file that has been generated on the basis of the request. The two stream return types are compound responses that consist of a series of small “messages.” Each message has a header that identifies its type, its index within that type, the offset of the message into the “data-bin” with that type and index, and the length of the message. The data-bins can be thought of as representing all information a client might ever request about a resource on the server. In response to a single request, a server might deliver a portion of several different data-bins. The messages typically do not contain a complete data-bin, instead they contain a portion of the data-bin that has not yet been sent to the client. Small message sizes allow data for different portions of the image to be interleaved, and thus a region specified by the client can grow uniformly in quality. Likewise, image data may be interleaved with meta-data. Providing the lengths of each message allows the stream to be elegantly terminated at the end of any message. This allows a server to stop responding to one request and begin responding to a new request on the same communications channel.
Tile-Part Return Type
JPIP defines a way to return a sub-image as a set of tile-parts, called a JPT-STREAM as an abbreviation for JPIP tile-part stream. Tile-parts are defined in the JPEG 2000 Part 1 standard. Each tile-part contains part of the compressed data for a tile. Because a normal JPEG 2000 decoder must be able to handle tiles provided in any order, a JPIP client receiving tile-parts can concatenate them to obtain a legal JPEG 2000 codestream, which a standard decoder can use to produce an image. Because tile-parts are self-labeled parts of a JPEG 2000 codestream, a server can simply select portions of a pre-existing file and send those in response to a request. More complex servers could rewrite the file on the basis of the request into a different division of tile-parts; this can provide more efficient responses at the expense of server computation.
Precinct Return Type
JPIP defines a way to return a sub-image as a set of byte ranges of precincts, also called a JPP-STREAM as an abbreviation for JPIP precinct stream. Precincts are defined in the JPEG 2000 Part 1 standard. A client must collect all bytes ranges corresponding to the same precinct and pass these on to a special JPIP precinct decoder which understands the precinct identifiers. This special purpose decoder determines which precincts to decode to generate a sub-image. Because precincts can vary in size with each resolution and byte ranges of precincts have low overhead, this method provides very fine grain control on the progressive improvement of the image with succeeding requests.
FIG. 6 shows the structure of a precinct data-bin. If the entire precinct data-bin was present in a file, it would consist of a sequence of packet headers (PH in FIG. 6) and packet data (PD in FIG. 6). The packet header contains information about which code-blocks have data in the packet data portion of the packet, information about the number of bytes stored for each code-block, and the number of coding passes for each code-block. There is no information in the packet header to identify which component, position, resolution, tile or layer it belongs to. There is also no explicit information about the length of the packet header. The length of the packet data can be determined by adding up the lengths of data for all the code-blocks in the packet. The length of the packet header can only be determined by decoding it. The short lines in the PD boxes in FIG. 6 represent the division between code-blocks.
For a packet in the lowest resolution, the code-blocks are all for the LL subband. For other resolutions, there are 3 packets for each layer, one for the HL subband, one for the LH subband, and one for the HH subband. FIG. 6 also shows a higher resolution, and thus the label at the bottom shows three packets associated with each layer.
Data-bins are not necessarily delivered in complete packets. A JPP-STREAM consists of a sequence of messages. Each message indicates the type of data-bin information contained (mainheader, meta-data, precinct, or tile-header). There is also an indication if the message contains the final portion of data for the data-bin. Each message also indicates the starting position in the data-bin and the length of the message. Finally, each precinct data-bin contains an identifier that indicates for which tile, component, position, and resolution the data-bin is. These messages are shown on the top part of FIG. 6. Messages may be received in any order, but each message for the same precinct indicates a starting position and a length. There may be gaps in the data received as shown in FIG. 6.
Full Image Return Type
The simplest JPIP return type from the client point of view is a complete image of only the sub-image in which the client is interested. This can be obtained by having the server decode the relevant portion of the image and re-encode it in a format understandable by the client, e.g., the server could return a classic JPEG, a portable network graphics image (PNG) or a JPEG 2000 image. The advantage of returning a full image is it can work with current client without modification and without knowledge of JPEG 2000 or JPIP. However, this return type requires a lot of computation to be done by the server. If the client understands (can decode) JPEG 2000, it is often possible for a server to “parse” the JPEG 2000 image and return a legal JPEG 2000 file that only contains data for the part in which the client is interested. Even the parsed JPEG 2000 image return type is not efficient for multiple sub-image requests, because the same portion of the image may be returned several times.
Thus, JPEG 2000 provides a codestream, which can be accessed to extract data that pertains to a particular spatial region, a particular color component, a particular spatial resolution, and a particular quality layer. JPIP provides a way to make requests for a particular region, component, resolution and quality. JPIP also defines mechanisms to return subsets of a JPEG 2000 codestream in a way that the client can identify which portions of the data it has received, decode those portions and produce an image.
Anticipated Uses of JPIP
The editors of the JPIP standard collected a variety of potential uses for interactive image communication. These “use cases” are described in detail in ISO/IEC JTC 1/SC 29/WG 1 N2656, published Jul. 15, 2002. One key area of the application of JPIP is the browsing of a large image. Many images are several thousand pixels across, and some small portable display devices (e.g., cell phones, personal digital assistants, etc.) have displays of only 100 pixels by 100 pixels. JPIP provides a way to look at low resolution versions or portions of these large images. Systems have been envisioned with a global positioning system (GPS) device that automatically updates a small screen with high resolution data corresponding to the exact location of the screen. This data could be extracted from within a very large JPEG 2000 image.
There are several areas of applications for JPIP capabilities including aerial imagery, medical imagery, and pre-press imagery. The nature of the network also varies tremendously (e.g., error free vs. frequent errors, high bandwidth vs. low bandwidth). In the consumer market, it is anticipated that JPIP could be used to browse photos, and one can imagine digital cameras with JPIP servers. JPIP can be useful for monitoring applications where low resolution/quality images are observed under “normal” conditions, but high resolution/quality images are needed when something happens. In some cases (e.g., a security camera), multiple images are of interest.
It is anticipated that JPIP will be used with document imagery (mixed text and graphics). JPIP might contain ways to allow special purpose servers to be a part of a main JPIP server (i.e., plug-ins to the JPIP server for a specific type of imagery (e.g., a JPM document)). Alternatively, JPIP might be used inside a main server (e.g., an Apache server with a JPIP plug-in) or a PDF server that uses JPIP for transport of images within a PDF file. While JPIP contains special facilitates for dealing with JP2, JPX, JPM, and MJ2 style file formats (all file formats with a similar “box” structure), JPIP will also be used to deliver codestreams stored inside other file formations, (e.g., DICOM, SVG, remote sensing file formats, etc.)
Backward Compatibility by Transcoding Images to Serve Current Browsers
JPIP provides for migration from JPEG to JPEG 2000. Current browsers do not support JPEG 2000 images or JPIP without plug-ins. However, a server can be designed to convert JPEG 2000 image to JPEG images when portions of those images are requested. The current syntax to request a 256 pixel by 256 pixel scaled version of “image.jp2” converted to JPEG via HTTP is:    GET/image.jp2?fsiz=256,256&type=image/jpg HTTP/1.1    Host: www.server1.com
Such a request is automatically generated by browsers when encountering an image tag in an HTML file of the following form:    <img href=“http://www.server1.com/image.jp2?fsiz=256,256&type=image/jpg”>Thus, current browsers can access portions of JPEG 2000 files and display them without any plug-ins.Capability Controlled Image Response
It is anticipated that the response the server provides to a particular window request may depend on several factors. For example, if the server determines the client has a limited resolution or limited color capabilities, the server might select the data transmitted in the response to match the capabilities. If the client requests a server determined region (by using the “roi=dynamic” request parameter), then the server can even choose the spatial portion of the image to return based on client capabilities. A request indicating the client can only process a 1 bit per component image and also has a particular vendor feature (vf) might look like:
GET/image.jp2?fsiz=2048,2048&roi=dynamic&cap=config.mbit=1,vf.486574658767465A486574658767465A HTTP/1.1Host: www.server1.com
There have been systems to communicate portions of images before JPIP. For example the Internet Imaging Protocol (IIP), Version 1.0.5, 6 Oct. 1997, provided access to FlashPix files. JPIP is expected to be much more efficient than IIP and FlashPix, in part because FlashPix stored multiple resolutions of the same image redundantly.
M. Boliek, G. K. Wu and M. J. Gormish, in “JPEG 2000 for Efficient Imaging in Client/Server Environment”, Proc. SPIE Conf. on App. of Digital Imaging, vol. 4472, pp. 212-223, Jul. 31-Aug. 3, 2001, San Diego, Calif., define a client/server environment to transport JPEG 2000 images and provide examples of JPEG 2000 byte usage in a “pan and zoom” session and describe a protocol based on Internet Imaging Protocol (IIP), Version 1.0.5, 6 Oct. 1997.
There are several other techniques for transport of sub-images listed in ISO/IEC JTC 1/SC29 WG1 N2602, 3 Jul. 2003, “TRUEW: Transport of Reversible and Unreversible Embedded Wavelets (A JPIP Proposal),” including submissions to the JPEG committee for the JPIP standard.
The current JPIP working draft is ISO/IEC JTC 1/SC29/WG 1 N2834, 28 Feb. 2003, “JPEG 2000 image coding system—Part 9: Interactivity tools, APIs and protocols—Working Draft version 3.0.