1. Field of the Invention
The present invention relates to a code processing apparatus and a code processing method for generating and processing files encoded with JPEG 2000 codes or JPEG 2000 family file formats.
2. Description of the Related Art
In recent years and continuing, wavelet transformation, the replacement of DCT (Discrete Cosine Transform) used by JPEG (Joint Photographic Coding Experts Group), is increasingly used in the field of image compression for performing frequency transformation. JPEG 2000, which has become an international standard, is one example of using the wavelet transformation method.
FIG. 1 is a block diagram showing an overall flow of an encoding process using JPEG 2000, in which an image (image data) is divided into rectangular tiles (number of divided parts ≧1) and processed tile by tile.
In FIG. 1, each tile is transformed into color components such as brightness and chrominance (Block 1). In a case where the image data are indicated with positive numbers (e.g., RGB data), a DC level shift is also performed on the image data for reducing half of the dynamic range. The components of the image data after the image transformation (hereinafter also referred to as “tile components” or simply “tiles”) are wavelet transformed to an arbitrary depth (Block 2). By performing the wavelet transformation process, the tiles are divided into sub-bands. In a single wavelet transformation process (decomposition), the tiles are divided into four sub-bands of LL, HL, LH, and HH. In the end, a single LL sub-band and plural HL, LH, HH sub-bands are formed by recursively performing the wavelet transformation process with respect to the LL sub-band.
FIG. 3 is a schematic diagram showing plural sub-bands after performing the wavelet transformation process (decomposition) for the third time. In FIG. 3, the numeral indicated in front of the sub-band (for example “3” of “3LL”) represents a decomposition level (number of times on which wavelet transformation is performed). FIG. 3 is also for showing the relationship between decomposition levels and resolution levels.
In a case of using an irreversible wavelet transformation called “9×7 transformation”, the wavelet coefficients for each sub-band are subject to linear quantization (including normalization) (Block 3). Then, the sub-bands are subject to an entropy coding process (Block4). By performing the entropy coding process, each sub-band is divided into rectangular regions referred to as “precincts”. Three precincts located at the same region of each of the sub-bands HL, LH, HH are handled as a single precinct partition. However, the precinct obtained by dividing sub-band LL is handled as a single precinct partition. The precinct basically serves to indicate a predetermined part (position) in the image. The precinct may have the same size as a sub-band. The precinct is further divided into rectangular regions referred to as “code-blocks”. FIG. 2 is a schematic diagram for showing the relationship of the tile, the sub-band, the precinct, and the code-block. As shown in FIG. 2, the relationship of the order of the physical sizes of image, tile, sub-band, precinct, and clock can be expressed as “image≧tile>sub-band≧precinct≧code-block”. After the image is divided in the manner described above, entropy encoding of coefficients (bit-plane coding) is performed on each code block in a bit-plane order.
Then, the entropy coded data are grouped (organized) together to generate a packet (Block 5). The packet consists of a packet body formed by gathering a portion of the bit-plane codes from all the code-blocks included in a precinct (for example, bit-plane codes obtained from the MSB (Most Significant Bit) bit-plane to the third level bit-plane) and a packet header attached to the packet body. Since a portion of the bit-plane codes may be empty, the inside of the packet may in some cases be empty from a coding aspect. The packet header includes data of the codes (contents) included in the packet. Each packet is handled independently. In other words, a packet is a unit of a code.
Accordingly, a portion of all codes of the entire image is obtained by collecting the packets of all precincts (=all code-blocks=all sub-bands) (e.g. obtaining the codes of the wavelet coefficients of the entire image from the MSB bit-plane to the third level bit-plane). This obtained portion is referred to as a “layer”. Since the layer is basically a portion of the codes of the bit-planes of the entire image, image quality becomes higher as the number of decoded layers increases. That is, the layer is a unit of image quality. The codes of all of the bit-planes in the entire image can be obtained by collecting all of the layers.
The upper part of FIG. 4 shows an exemplary layer configuration in a case where “decomposition level=2”, and “precinct size=sub-band size”. The lower part of FIG. 4 shows packets included in several layers of the layer configuration (illustrated in a manner encompassed by thick lines in FIG. 4).
Thereby, in accordance with the generated packets and the manner in which the layers are divided, the packets are arranged in a predetermined order and added with tags and tag data. Accordingly, a final JPEG 2000 code (codestream) can be formed (Block 6).
It can be understood from the above description that each packet has four characteristics (indices) indicating the component of the packet (hereinafter also indicated with symbol “C”), the resolution level (hereinafter also indicated with symbol “R”), the precinct of the packet (hereinafter also indicated as “position” or symbol “P”), and the layer (quality layer) (hereinafter also indicated with symbol “L”). The characteristics (indices) of the packet are hereinafter referred to as “progression characteristics”. A packet header is provided at the beginning of the packet. Although the packet header is followed by MQ codes (packet data), the packet header does not have the progression characteristics themselves written therein.
The arrangement of the packets is defined according to the order in which the progression characteristics of the packets (including the packet header and the packet data) are arranged. The order of the progression characteristics defining the packet arrangement is referred to as a progression order. Different progression orders can be obtained by using a collection of nested loops. FIG. 5 shows five different types of progression orders. The manner in which an encoder arranges (encodes) the packets according to the progression order and the manner in which a decoder interprets (decodes) the characteristics of the packets according to the progression order are described below.
For example, in a case where the progression order is LRCP, the packet arrangement (encoding) or interpretation (decoding) is conducted according to a “for loop” shown below.
for (layer) { for (resolution) { for (component) {  for (precinct) {  during encoding:packet arrangement  during decoding:packet characteristic interpretation  } } }}
The packet header of each packet is written with data indicating whether the packet is empty, which code blocks are included in the packet, the number of zero bit planes of each of the code blocks included in the packet, the number of coding passes of the codes of each of the code blocks included in the packet (the number of bit planes), and the code length of each of the code blocks included in the packet. However, data indicating, for example, a layer number or resolution level are not written in the packet header. Therefore, in order to determine the resolution level or the layer of a packet during a decoding operation, it is necessary to generate a for loop (such as the one described above) based on a progression order written in, for example, a COD marker segment (for example, see FIGS. 17 and 18) in the main header, identify a discontinuity of the packet according to the sum of the code length of each of the code blocks included in the packet, and determine the part of the for loop in which the packet is handled. Accordingly, by simply reading out the code length indicated in the packet header, the next packet can be detected (that is, a given packet can be accessed) without having to decode packet data (entropy codes).
It is to be noted that the outermost part of the nest of the for loop is referred to as “outermost part of the progression order”.
FIG. 6 is a schematic diagram for describing a progressive by layer codestream such as an LRCP progression order codestream having a layer characteristic (L) situated at the outermost part of the progression order. In this example, the number of tile parts is one.
FIG. 7 shows an example of an arrangement of 36 packets according to an LRCP progression order (order of interpreting packets) in a case where the conditions are “image size=100×100 pixels”, “no tile division (one tile)”, “two layers”, “resolution level number=3 (0-2)”, “three components (0-2)”, and “precinct size=32×32”. It is to be noted that the number of tile parts in this example is one.
FIG. 8 is a schematic diagram for describing a progressive by resolution codestream such as an RLCP progression order codestream having a resolution characteristic (R) situated at the outermost part of the progression order. In this example, the number of tile parts is one.
FIG. 9 shows an example of an arrangement of 36 packets according to an RLCP progression order (order of interpreting packets) in a case where the conditions are “image size=100×100 pixels”, “no tile division (one tile)”, “two layers”, “resolution level number=3 (0-2)”, “three components (0-2)”, and “precinct size=32×32”. It is to be noted that the number of tile parts in this example is one.
The code (codestream) for each tile can be further divided into plural parts at the discontinuities of packets. These divided parts are referred to as “tile parts”. As described above, the number of tile parts for the packet arrangement shown in FIGS. 7 and 9 is one.
Besides packets, each tile part includes a header starting from a SOT (Start Of Tile-part) marker segment and terminating at a SOD (Start Of Data) marker. This header is hereinafter also referred to as a “tile-part header” (See FIGS. 6 and 8).
FIG. 10 is a schematic diagram showing an exemplary configuration of a SOT marker segment. It is to be noted that a part including a marker (in this example, an SOT marker) and parameters related to the marker is hereinafter also referred to as “marker segment”. FIG. 11 shows the content of the parameters of the SOT marker segment. According to FIG. 11, the length of a particular tile part can be detected (determined) by reading out the content of parameter “Psot” included in the SOT marker segment. Accordingly, by reading out the SOT marker segment, access can be made to codes (codestreams) in tile part units without having to decode the packet header.
In a case where the process of decoding the packet header is desired to be omitted, the length of each packet may be recorded in, for example, a PLT marker segment inside the tile-part header or a PLM marker segment inside the main header. Furthermore, in a case where it is desired that the process of searching for the SOT marker segment be omitted, the length of each tile part may be recorded in a TLM marker segment inside the main header. FIG. 12 shows an exemplary configuration of the TLM marker segment. FIG. 13 shows the content of the parameters of the TLM marker segment. According to FIG. 13, the length of a particular tile part (e.g. the ith tile part) can be detected (determined) by reading out the content of parameter “Ptlm (i)” included in the TLM marker segment.
As described above, the JPEG 2000 codestream can be accessed in units of packets as well as in units of tile parts. With the JPEG 2000 codestream, new codes (codestreams) can be generated by extracting only packets and tile parts required without having to decode the entire original codes. Furthermore, the JPEG 2000 codestream enables decoding of a necessary amount (number) of packets and tile parts in the original codes.
For example, in a case of displaying a large image stored in a server (PC) on a client (PC), the client (PC) can decode the image data of the image by receiving only the codes of necessary image quality, codes of necessary resolution, codes of an area (precinct) desired to be viewed, or codes of a component desired to be viewed from the server (PC).
This protocol of receiving only required portions of a JPEG 2000 codestream from a server is hereinafter also referred to as JPIP (JPEG 2000 Interactive Protocol). This protocol is currently in the middle of being standardized. As for protocols for accessing a portion(s) of a multi-layer (multi-level) image, there are, for example, FlashPix (used for expressing multiple image resolutions) and IIP (Internet Imaging Protocol) used for accessing such FlashPix images (for example, see Japanese Laid-Open Patent Application No. 11-205786 and website http://www.i3a.org/i_iip.html showing a standard reference of IIP). As a reference of JPIP, there is, for example, Japanese Laid-Open Patent Application No. 2003-23630 showing a JPIP cache model.
In the JPIP protocol, a methodology of designating a desired resolution of a particular image and a window size for actually depicting the image is proposed. In a case where a server (PC) receives such designation, the server (PC) may either use a method (system) of transmitting packets covering a particular area of the image with a designated resolution or a method (system) of transmitting tile parts covering a particular area of the image.
Next, an example of the latter JPIP system of transmitting tile parts is described (hereinafter also referred to as “JPT system”).
In a case where the JPT system is used, tile parts covering a particular area of an image are extracted from the tile parts of the entire image in the following manner. In this case, it is a premise that the server (PC) knows how the tile parts of the codes (codestreams) managed by the server (PC) itself are divided.
For example, in a case where packets of the RLCP progression order codestream corresponding to one tile and two layers as shown in FIG. 9 are divided at the boundaries (discontinuities) of all resolution levels (area where the resolution level is switched from one level to another), three tile parts (tile parts 0-2) can be obtained as shown in FIG. 14. Furthermore, in a case where the same packets of the RLCP progression order codestream are divided at the boundaries (discontinuities) of all resolution levels and the boundaries (discontinuities) of all layers, six tile parts (tile parts 0-5) can be obtained as shown in FIG. 15. Furthermore, in a case where the same packets of the RLCP progression order codestream are divided at the boundaries (discontinuities) of all resolution levels, the boundaries (discontinuities) of all layers, and the boundaries (discontinuities) of all components, eighteen tile parts (tile parts 0-17) can be obtained.
When a client (PC) transmits a request to a server (PC) indicating that a resolution portion corresponding to 25×25 pixels is desired to be displayed in a 20×20 window size (in the example shown in FIG. 9, the “resolution part corresponding to 25×25 pixels” indicates the portion where the resolution level is 0 and the “20×20 window size” indicates 20×20 pixels among the pixels having a resolution level of 0), the server (PC) extracts tile parts covering resolution level 0 (tile part 0 in FIG. 14) and transmits the extracted tile parts together with main header data of the codestream. Since each tile part has an SOT marker segment and the length of the tile part can be determined, the boundaries of the tile parts can be distinguished.
As shown in FIGS. 14-16, the tile parts to be transmitted (i.e. from tile part no. x to tile part no. y) are determined according to two parameters. The first parameter is the progression order of the codestream and the second parameter is the method of dividing a codestream into tile parts (divided location).
The first parameter (progression order) can be easily determined since the progression order is written in a COD marker segment of a main header or a tile part header. However, the second parameter (dividing method) is not recorded in the JPEG 2000 codestream or the JPEG 2000 file format family. Therefore, conventionally, unless the dividing method is known beforehand, it becomes necessary to count the packets in the codestream one by one in order to select a desired tile part. However, such counting of packets is inefficient.
Accordingly, in Japanese Patent Application No. 2006-67482, the applicant of the present invention proposes a method of recording data indicating the dividing method in a COM marker segment (a marker segment allowed to freely include data, for example, vendor data according to JPEG 2000 codestream format) or recording the data in a UUIDBox or a XMLBox (Box is allowed to freely include, data, for example, vendor data according to JPEG 2000 family file format).
In a case of a server using the JPT method (JPT server), the JPT server conducts a two step operation including a calculation step for calculating the position (location) of each tile part and a transmission step for transmitting a particular tile part(s) when required. In one example, the server may be configured to i) calculate the positions of all of the tile parts (scanning of SOT marker segment) and ii) transmit a particular tile part(s) when required.
Therefore, the JPT server normally conducts the following step (hereinafter referred to as “Step A”) of:
for (with respect to all tile parts) {scan next SOT marker segment and record position and length of tile part}
One reason that this step is used is because the tile parts required to be transmitted may change chronologically due to, for example, the window size designated by the user of the client (PC). Therefore, it is desirable that the server (PC) calculate and store the locations of the all of the tile parts beforehand prior to determining whether to transmit a tile part(s). Another reason is due to the arrangement of tile parts in the codestream.
In using the JPT method, one important factor is the order in which the tile parts of the codestream is arranged. The method of arranging tile parts (tile part arrangement method) is described using an example of a codestream of a monochrome image (see FIG. 24) where the codestream is divided into tile parts at the boundary parts having a maximum resolution level=2 and a layer number=1.
In this example, a tile index (characteristic) indicating which tile is selected (designated) may be added to the conventional R, L, C, P progression order, so that the tile parts are arranged in a “tile order to resolution level order” (this order having the tile index (tile characteristic) situated at an outermost part of the progression order is referred to as “highest tile order”) as shown in FIG. 25 or arranged in a “resolution level order to tile order” as shown in FIG. 26.
In a case where tile parts, for example, corresponding to resolution level 0 are requested from the JPT server, a total of four tile parts is sufficient for arranging packets of a codestream having a tile part arrangement shown in FIG. 25 since the packets of the codestream can be arranged by simply obtaining data indicating the position and length from the first tile part to tile part “A” (as long as there is no change in the tile part(s) requested). In this case, even where the above-described Step A (scanning of all tile parts) is conducted, the total tile parts scanned are merely six tile parts. Therefore, few tile parts (in this case, two tile parts) do not actually have to be subject to SOT scanning.
On the other hand, in a case of arranging packets of a codestream having a tile part arrangement shown in FIG. 26, a total of two tile parts is sufficient for arranging the packets of the codestream having a tile part arrangement shown in FIG. 26 since the packets of the codestream can be arranged by obtaining data indicating the position and length from the first tile part to tile part “B” (as long as there is no change in the tile part(s) requested). Therefore, in a case where the above-described Step A (scanning of all tile parts) is conducted, six tile parts are subject to SOT scanning. Thus, ⅔ of the scanned tile parts do not actually have to be subject to SOT scanning.
In a case where the arrangement of the tile parts as shown in FIG. 26 is known beforehand, the following for loop for arranging the tile parts in a “resolution order to tile order” may be used.
for (resolution level = 0) { for (0≦ tile index≦ 1) { scan the next SOT marker segment and record the position and length of tile part  }
By using the above for loop, only two tile parts need to be scanned. This prevents unnecessary scanning of tile parts. In order for this for loop to be used (or in order to calculate the number of necessary tile parts from the beginning and controlling the number of times for scanning the SOT marker segment), it is a premise that the arrangement of the tile parts can be determined (known beforehand). However, the arrangement of the tile parts in a codestream is not recorded in a conventional JPEG codestream or a JPEG 2000 family file format.
It is to be noted that a characteristic (index) corresponding to the purpose of use is typically positioned at an outermost part of the progression order in a case of JPT or JPP (JPIP precinct). For example, in a case where the main purpose is to display a size-reduced image, the characteristic of R (resolution) is positioned at the outermost part of the progression order. In a case where the main purpose is to display a monochrome image of a color image, the characteristic of C (component) is positioned at the outermost part of the progression order. Furthermore, in JPT, the tile parts are typically divided at boundaries of the characteristic corresponding to the above-described purpose.
Taking the above-described aspects into consideration, the applicant of the present invention proposed a method of recording data indicating the tile part dividing method and arrangement method in a COM marker segment of a codestream or a box such as UUIDBox or XMLBox of a JPEG 2000 family file format (See Japanese Patent Application No. 2006-77206).
It is also to be noted that it is a premise that each tile is to be divided into two or more tile parts at the boundaries of resolution level so that the tile parts can be arranged from “resolution level order to tile order” as shown in FIG. 26. This is because such tile part arrangement cannot be conducted in a case of 1 tile part/1 tile.