JPEG 2000
FIG. 1A-D illustrate operations performed by a typical JPEG 2000 system. Referring to FIG. 1A, an original image may be divided into regular tiles, such as tiles 101. These tiles are compressed independently. Thus, the coded data from the upper left tile is sufficient to reconstruct the upper left part of the image. Likewise, any region can be reconstructed by obtaining the data from all the tiles that intersect that region and decoding them.
A two-dimensional wavelet transform is performed on each tile. This leads to a set of wavelet coefficients grouped in subbands, such as subbands 102. One of the subbands is a low resolution version of the image—it is exactly the same image as would be obtained by operating a particular low pass filter on the entire image and subsampling. Importantly, that subband can be combined with other subbands to form intermediate resolution versions of the image. This is shown in FIG. 1C. Thus, a JPEG 2000 codestream provides access to images at low resolution, and resolutions larger by a factor of 2 horizontally and vertically, up to the full resolution of the image.
The wavelet transform coefficients are regrouped, or divided, into “precincts,” such as shown in FIG. 1B. These precincts form an alternative mechanism to access spatial regions of an image. Tiles and precincts can both be used when compressing an image. If both are used, the tiles provide a coarse grain access to regions of the image, while the precincts provide access to more localized regions.
Finally, precincts are divided into code-blocks and these groups of wavelet coefficients are compressed with multiple passes, starting with the most significant bit and proceeding to the less significant bits. This is shown in FIG. 1D. These “coding-passes” remove statistical redundancy in a lossless manner. For example, large numbers of zero bits are replaced by much shorter codes. The fixed sized block of wavelet coefficients thus becomes a variable length segment of coded data. The first part of the segment is the most important part because it corresponds to the most significant bits of the wavelet coefficients. If the later parts of the coded data segment are not available to a decoder, an image can still be obtained, but it is just of lower quality.
A “codestream” is defined in JPEG 2000 Part 1 as consisting of a series of “marker segments” which are essentially two byte codes identifying the next bit of data in the codestream, followed by the data. The codestream contains all the entropy coded data for the image, and data describing the method, which should be used to decode the coded data. For example, the codestream contains information about the wavelet transform used, the size of tiles, precinct sizes, information about the number of resolutions, the order of packets in the file, etc. The codestream must contain all the parameters needed to decode the entropy coded data into image samples. The codestream may also contain information to provide rapid access to portions of the coded data, e.g., lengths of the packets.
A “file-format” is a wrapper for one or more codestreams. JPEG 2000 Part I defines a simple “JP2” file format. JPEG 2000 Part 2 defines the “JPX” format to store more complex information. JPEG 2000 Part 3 defines the “MJ2” file format for motion JPEG 2000.
File-formats typically provide “meta-data” associated with the image data contained in the codestream, e.g., audio, XML information, image capture conditions. The file format also commonly contains information about the color space of the decoded image data.
JPEG 2000 Part 6 defines the “JPM” file format for compound documents. There are also file-formats not defined by the JPEG 2000 committee that could contain JPEG 2000 codestream. PDF has been recently updated to allow JPEG 2000 codestreams. PDF and JPM are good file formats for documents with multiple pages.
JPIP
JPIP stands for JPEG 2000 Interactive Protocol and will become an international standard, ISO/IEC 15444-9. The JPEG committee is still working on this document, which is nearing the final draft stage. The current document is “15444-9 (JPIP) FCD Study Text 0.1” ISO/IEC JTC1 SC29JWG1 N2396.
JPIP defines a syntax a client can use to make requests to a server for portions of an image. JPIP also defines two new “media-types” that a server can use to return portions of a JPEG 2000 file or codestream. A key facet of JPIP is that it is meant to be interactive. Thus, there can be follow up requests for more data from the same image, and the returned data should not have to repeat data the client has already received.
JPIP Requests
JPIP defines several parameters that can be used by a client in making a request. These requests indicate the sub-image in which the client is interested. The requests also may provide information about what meta-data the client is interested in and information to control the type of response given by the server. The most important parameters for purposes of this disclosure are frame size, region offset, and region size.
The frame size parameter appears as “fsiz=128,128” in several examples, and indicates the size of the entire image the client wishes to use to access a region. If there is no region size parameter (as described below), the frame-size is simply the size of the image the client wants. For example, an image that is 512 by 512 samples and encoded with 4 levels of wavelet transform can be accessed with a frame size of 512 by 512, 256 by 256, 128 by 128, 64 by 64, or 32 by 32. In the first request to the server, the client may not know the actual frame-sizes available. In this case, the client can indicate a desired size and a rounding method, and the server will return the closest available size (the other parameters will be scaled to match the actual frame size used).
The region offset parameter might be used as “roff=64,64” and indicates that the client is not interested in the entire image, but only in a region beginning at the offset specified by the two values given in the request. The offset is relative to the request frame-size.
The region size parameter might be used as “rsiz=64,64” and indicates the size of the region desired by the client. Instead of providing the three parameters fsiz, rsiz, and roff, a client may indicate interest in file or server defined regions of interest (ROI) with the “roi” parameter. The roi parameter can be used to request a named region identified in one of the file format boxes. The value may be set to “dynamic” as in “roi=dynamic” to indicate that the server should choose the appropriate region based on any knowledge of the client.
Responses
JPIP responses can take three main forms: complete image file, tile-part stream also called JPT-stream, or precinct stream also called JPP-stream. The complete image file return type is essentially like returning a custom file that has been generated on the basis of the request. The two stream return types are compound responses that consist of a series of small “messages.” Each message has a header, which identifies its type, its index within that type, the offset of the message into the “data-bin” with that type and index, and the length of the message. The “data-bins” can be thought of as representing all information a client might ever request about a resource on the server. In response to a single request a server might deliver a portion of several different data-bins. The messages typically do not contain a complete data-bin; instead they contain a portion of the next part of the data-bin that has not yet been sent to the client. Small message sizes allow data for different portions of the image to be interleaved, and thus a region specified by the client can grow uniformly in quality. Likewise, image data may be interleaved with meta-data. Providing the lengths of each message allows the stream to be elegantly terminated at the end of any message. This allows a server to stop responding to one request and begin responding to a new request on the same communications channel.
JPM
JPM is intended to provide storage for document images. One of the key benefits of JPM over other file formats commonly used to contain images compressed with JPEG 2000 is the ability to handle a large number of pages. In fact, the introduction to the JPM standard describes use of JPM to store an entire encyclopedia. Because of the ability to include other files, there can be many files storing the pages of the encyclopedia, but it is possible to “browse” from page 1 to the end, with a user agent stepping through all the necessary files.
All of the data in a JPM file appears in a “box”. A box is an organizational unit that always contains a length and a type. From these two fields, any program or user accessing the file can determine the type. If the type is not understood by a decoder or unneeded for the particular task, the program or user can use the length to locate the next box. Many files can be decoded or otherwise used effectively without knowing all of the types of boxes. Some boxes only appear inside other boxes. Once again, the length field can be used to skip a box that is unknown or unneeded for the current task whether the box appears at the “file level” or inside another box. Some boxes contain data values that indicate the number of bytes from the beginning of a file to some data, usually another box, or a portion of a compressed codestream. These data values are often referred to as “pointers” because they indicate the location of something else. These data values are also often referred to as “offsets” because they indicate the distance in bytes from the beginning of a file.
Important Boxes
The JPM file format allows the use of many different box types. Some specific boxes are described in greater detail below.
Data Reference Box
The Data Reference Box lists all external files necessary to fully decode the current file. The Data Reference Box provides both a list of all other files that might be necessary to make full use of the JPM file and serves as a way to provide a short two-byte code for external data segments. Other data in a JPM file that makes use of data outside of the JPM file includes only a data reference id number, which is the number within the list of external references. Only the Data Reference Box contains the actual “path” or “URL” to another file (which might be quite long, and are certainly of different lengths). This feature makes it easy to change external references from one file to another, by updating only one box.
Contiguous Codestream Box
The Continuous Codestream box contains a complete codestream. The codestream is decoded by using normal JPM decoders to create a mask or image object. The codestream box itself does not indicate the purpose of the codestream; rather other boxes may contain a reference to a contiguous codestream box. Thus, the same codestream can be used for multiple objects perhaps on different pages.
Fragament Table Box
The Fragment Table box is an alternative way to address codestreams for layout objects. While the contiguous codestream box must contain the entire codestream in order, the Fragment Table box allows a codestream to be stored as small segments called fragments each in a different location. The Fragment Table box contains a list of fragments of the codestream. Each fragment must be contiguous, but the full codestream is obtained by concatenating all of the usually discontinuous fragments. The fragments can be stored within the same file as the Fragment Table box or in external files listed in the Data Reference box or both. Fragments within the same file are commonly in Media Data boxes.
Page Collection Box and Page Table Box
A Page Collection box is a way to group pages in a JPM file. The Page Collection box contains a Page Table box that is a list of pages and other Page Collection boxes. By including other page collections, a very large “document” can be represented. For example, the highest level page collection might represent a “library” with each of its page collections representing a “book” and each next level page collection representing a chapter, and the final Page Collection box listing pages in the chapter. No image data is contained in the Page Collection and Page Table boxes; they simply contain pointers to page boxes.
Page Box and Page Header Box
The Page box contains a set of other boxes including the Page Header box and Layout Object boxes for the page. The Page Header box contains information about the page: width, height, orientation, and color.
Layout Object Box and Layout Object Header Box
The Layout Object box contains the Layout Object Header box, object boxes, and optionally additional boxes. The Layout Object Header box contains fields with information about the layout object: identifier, height, width, horizontal and vertical offset (of the image data), and a style.
Object Box and Object Header Box
The Object box contains the Object Header box and optionally additional boxes. The Object Header box contains fields with information about the object (mask or image). The fields are: type, codestream indicator, horizontal and vertical offsets (of image data), offset of Contiguous Codestream box or Fragment Table box, length of Contiguous Codestream or Fragment Table box, and a data reference number which allows access to other files through the Data Reference box which is stored elsewhere in the file. The most important elements for this disclosure are the offset and length of the codestream boxes, because these elements need to be accessed to render the images.
Shared Data Entry Box
The Shared Data Entry box is designed to contain a box that is used repeatedly in the file. The box consists of an id value, and the shared data. Instead of including a box multiple times in different places in the JPM file, a Shared Data Reference box can be used, and the data within the shared data entry box will be used. It is anticipated by the standard that the contents will be a box. However, the data within a Shared Data Entry box can be anything at all, since it will not be read unless a Shared Data Reference box is used.
Shared Data Reference Box
A Shared Data Reference box consists solely of the length and type fields that all boxes contain, and an id field. The id field is used to identify the Shared Data Entry box that should be used in place of the Shared Data Reference box at the location of the Shared Data Reference box. For example, if the same layout objects were used in multiple pages of a file, the Layout Object Header box might be stored once in a Shared Data Entry box, then in each layout box within each page box that would normally contain a complete copy of the Layout Object Header box, a Shared Data Reference box can be used instead. The Shared Data Entry box that the Shared Data Reference box uses must appear in the same file as the Shared Data Reference box and before any Shared Data Reference box with the same id.
JPM File Structure
FIG. 5 shows an outline of an exemplary JPM file with the details being shown for boxes and offsets. The file corresponds to a possible encoding of the two-page document shown on the left side of FIG. 4 described later. FIG. 5 shows one line for each box in the file, in the order the boxes appear in the file. Box names beginning at the left hand side of FIG. 5 appear at the top level of the JPM file. Indented box names represent boxes that are sub-boxes of the first box above them that is closer to left side of FIG. 5. Required boxes are shown, but many optional boxes are not shown to avoid obscuring the present invention. Some field names within a box are shown on the line after the box name and appear in FIG. 5 in italics. Most field names are not shown at all, while many of those field names that contain offsets are shown. These are shown along with an arrows pointing to where the offset goes.
The particular file in FIG. 5 has all offsets pointing to objects beyond the current object, except for the pointers to the primary page collection box. This is a legal order, but is not required. For example, the Shared Data Entry boxes could immediately follow the Compound Image Header box, the Fragment Table box could follow the Shared Data Entry boxes, Page boxes could follow the Fragment Table boxes and Contiguous Codestream boxes, and/or the Page Collection box could appear last. In this ordering, all of the offsets would be to items already having occurred in the file. Note that while the offsets have been drawn as pointers, they are actually offsets from the beginning of the file, not from the current location in the file. Thus, all pointers to the primary Page Collection box have the same value.
In FIG. 5, each text like object has been encoded with a single codestream and stored in a Contiguous Codestream box. Each image like object has been coded with JPEG 2000, the JPEG 2000 codestream has then been split into a pair of Shared Data Entry boxes, and addressed with a Fragment Table box.
FIG. 6 shows information similar to FIG. 5 except that the specific order in the file is not shown (since this may vary), and only boxes at the top level of the file are shown. The pointers that are shown usually occur deeply nested inside the top level boxes as was shown in FIG. 5. However, FIG. 6 allows the “structure” of the file to be more plainly illustrated. Many operations performed in a JPM client server may be understood by considering the top-level file boxes and the overall structure in the file. FIG. 6 also has the codestream stored in a Media Data box rather than in a Shared Data Entry box.
If the offsets stored in the JPM file are to the first byte of the a box, then the arrows is shown pointing to the top of the box. If the offsets stored in the JPM file are to the contents of a box, and thus point to the bytes of the codestream rather than the containing box, then the arrows are shown pointing to the side of the box. In this case, it is possible that different offsets will address different points in a box. For example, the Fragment Table box with two arrows to the media data box would typically address different ranges of the media data box for each fragment.