1. Field of the Invention
The present invention relates to an orthogonal transform technique when encoding image data.
2. Description of the Related Art
Recently, the advance of sensors, display devices, and editing tools has led to a growing number of HDR (High Dynamic Range) images where more than eight bits express one color component. Further, the resolution has increased to so-called high vision or super high vision in which the number of pixels per frame is 4K×2K or 8K×4K. As a result, the image data amount becomes very large. To save such image data or transfer it within a short time, a compression encoding technique (for example, Japanese Patent Laid-Open No. 2001-78190) is indispensable.
A standard method to efficiently compress HDR high-resolution image data is “JPEG XR” (for example, Japanese Patent Laid-Open No. 2006-197572). JPEG XR is defined as image data that is formed from a plurality of tiles and one tile that is formed from a plurality of MBs (Macro Blocks). A JPEG XR stream is formed from encoded data of a plurality of tiles. As array formats for encoded data of a lower layer in encoded data of one tile, JPEG XR defines two, spatial and frequency modes.
The spatial mode is a data structure where encoded data of MBs within a tile are aligned in the raster order for each macroblock. The stream of each MB is formed from a DC coefficient stream, LP coefficient stream, and HP coefficient stream in the order named (see FIG. 7B).
To the contrary, the frequency mode is a data structure in which encoded data of a tile are aligned for each coefficient layer (DC, LP, or HP). More specifically, in the stream in the frequency mode, packets are formed from the DC coefficient streams of respective MBs, and packets formed from the LP coefficient streams follow. Then, packets formed from the streams of upper bits of HP coefficients follow, and finally, packets formed from the streams of lower bits (FLEX) of the HP coefficients follow (see FIG. 7A). Note that details of the DC, LP, and HP coefficients in JPEG XR will be explained in the processing sequence of orthogonal transform processing to be described later.
JPEG XR encoding processing includes pre-processes such as color conversion and sub-sampling, orthogonal transform, quantization, coefficient prediction, and entropy encoding in the order named (see FIG. 9). In entropy encoding, a coefficient prediction error is separated into upper and lower bits so that the number of generated significant data (nonzero) to undergo variable length coding is made equal between MBs. The upper bits of the coefficient prediction error undergo variable length coding, outputting a code to a stream. The lower bits are directly output as fixed length data to the stream without performing variable length coding.
JPEG XR encoding processing is executed for each macroblock (MB) made up of 16×16 pixels. At this time, orthogonal transform is done for a small block of 4×4 pixels. In JPEG XR, this orthogonal transform is called PCT transform. PCT transform for one small block generates one small-block DC coefficient (HPdc to be described later) and 15 AC coefficients (HP coefficients). One macroblock includes 4×4 small blocks. Hence, 4×4 small-block DC coefficients and 4×4×15 (=240) AC coefficients are calculated from one macroblock. The latter AC coefficient is the “HP coefficient” described above.
Then, PCT transform is performed again for 4×4 small-block DC coefficients, obtaining one DC coefficient and 15 AC coefficients. The former is the “DC coefficient” of the macroblock, and the latter is the “LP coefficient” of the macroblock.
FIG. 2 shows a conceivable arrangement for performing orthogonal transform to generate a stream in the frequency mode. The operation will be explained.
An image storage unit 206 stores digital image data obtained by an image sensor or the like. After the start of encoding, a DC coefficient is calculated to generate a packet formed from a DC coefficient stream.
More specifically, the digital image data stored in the image storage unit 206 is regarded to be stored as respective tiles separated at a desired rectangular size in order to perform encoding processing. Each tile is regarded to include MBs aligned in the raster order. The MB serves as the processing unit of encoding processing (see FIG. 8A). Further, the MB is regarded to include small blocks each of 4×4 pixels (FIG. 8B). Encoding processing is done for each small block.
Image data stored in the image storage unit 206 is sent to a first transforming unit 201 via a memory controller 205.
The first transforming unit 201 is a processing unit which executes lossless orthogonal transform (PCT transform) for each small block (4×4 pixels) of the sent MB to calculate the frequency coefficients of one direct current (DC) component (HPdc) and 15 alternative current (AC) components (HPs). The obtained coefficients are sent to the memory controller 205, and written back in the image storage unit 206. Since one MB includes 4×4 small blocks, this processing is executed 16 times. After processing of the first transforming unit 201 ends for one MB, 4×4 HPdc coefficient data and 240 HP coefficient data are stored in the image storage unit 206.
To calculate a DC coefficient, the memory controller 205 reads out, from the image storage unit 206, the 4×4 HPdc coefficient data which have been written and belong to the same MB, and sends them to a second transforming unit 202.
The second transforming unit 202 performs the same frequency conversion as that of the first transforming unit 201 for the 4×4 HPdc coefficient data belonging to the same MB, calculating one DC coefficient and 15 LP coefficients. The DC coefficient calculated by the second transforming unit 202 is sent from the orthogonal transforming unit to the next processing (quantizing unit) via a selector 203. In contrast, the 15 LP coefficients are written back in the image storage unit 206 via the memory controller 205. The LP coefficients are held until the DC coefficients of all MBs within the tile are output.
After the selector 203 outputs all the DC coefficients of all MBs within the tile, the memory controller 205 reads out the “LP coefficients” of each MB from the image storage unit 206, and outputs them to the selector 203. The selector 203 sends the received “LP coefficients” from the orthogonal transforming unit to the next processing (quantizing unit).
After the selector 203 outputs all LP coefficients within the tile, the “HP coefficients” of all MBs within the tile are read out from the image storage unit 206 and output from the selector 203. The selector 203 sends the received “HP coefficients” from the orthogonal transforming unit to the next processing (quantizing unit). At this time, a data-hierarchy controller 204 controls switching of output data in the selector 203.
As is apparent from the above description, the “HP coefficient” is calculated first, and then the “DC coefficient” and “LP coefficient” are calculated for each MB during calculation in JPEG XR orthogonal transform. However, DC coefficients, LP coefficients, and HP coefficients need to be aligned in the order named in an encoded stream for one tile in the frequency mode of JPEG XR. It will be understood that the order in encoding processing differs from the order of data in the encoded stream, so LP and HP coefficients calculated in respective calculation processes need to be written back in the image storage unit 206 and rearranged so that they are output from the orthogonal transforming unit in the stream order.
As described above, when generating a stream in the frequency mode, the conventional orthogonal transform method needs to rearrange coefficients so that they are output from the orthogonal transforming unit in an order in which they form a stream. For this purpose, calculated HP and LP coefficients need to be written back in a memory such as the image storage unit.
For simplicity, assume that an image to be encoded is a monochrome image having only one color component. In JPEG XR, the maximum size of a tile which forms image data may be equal to the number of pixels of the image data. In this case, the total number of HP and LP coefficients written back in the image storage unit 206 in the frequency mode equals the number of pixels of original image data. Also, the bit width per HP coefficient is larger by 5 bits than the image sample. The number of data whose LP coefficients are written back is about 1/16 of the number of pixels of image data. Further, the bit width per LP coefficient is larger by 7 bits than the image sample.
This means that, first, the memory capacity used increases. Second, the memory read and write counts become very high, and the number of data bits in read and write increases, impairing the performance of overall encoding processing. This also raises the component cost of an apparatus equipped with encoding processing.