As a technique of improving image coding efficiency, a code amount (rate)/distortion optimization technique is available. The rate/distortion optimization technique is designed to obtain a generated code amount and an index value associated with image distortion for each of a plurality of sections constituting coded data and minimize the total distortion index value under the condition that the total code amount is equal to or less than a target value.
According to international standard JPEG2000 (ISO/IEC 15444) for still image coding established by standardization in ISO/IEC JTC1/SC29/WG1, the coefficient of each subband obtained by wavelet transform is segmented into rectangular regions called code blocks, and each rectangular region is independently coded. JPEG2000 codes each code block upon segmenting it into a plurality of passes, and it is contemplated that a generated code amount and image distortion index value are obtained on a pass basis, and a rate/distortion optimization technique is applied to coding. As a reference for the implementation of JPEG2000, a method of applying the rate/distortion optimization technique is disclosed (see, for example, Annex J Examples and guidelines of standard recommendation (ISO/IEC 15444-1) which will be referred to as non-patent reference 1 hereinafter).
Letting ni be a code truncatable point of a code block Bi, Ri(ni) be the code amount of the code block Bi when code truncation is performed at ni, and Di(ni) be a distortion index value, a total distortion index value D and total code amount R of an overall image can be represented by
      D    =                  ∑        i            ⁢              Di        ⁡                  (          ni          )                          R    =                  ∑        i            ⁢              Ri        ⁡                  (          ni          )                    
The object of rate/distortion optimization is to obtain a set of truncation points ni which minimize the total distortion index value D under the condition of a target total code amount Rmax or less, i.e., R≦Rmax.
This optimization problem can be solved by using a generalized Lagrange multiplier method (see, for example, “Generalized Lagrange Multiplier Method for Solving Problems of Optimum Allocation of Resources”, Operation Research, vol. 11, pp. 399-417, 1963 which will be referred to as non-patent reference 2 hereinafter).
That is, the problem reduces to minimization of the following expression with respect to a given value λ. Note that the value λ is adjusted to make the total code amount R become equal to or less than Rmax.Σ(Di(ni)+λ×ri(ni))
The minimization of the above expression reduces to the problem of the minimization of each code block. A simple algorithm for obtaining the code truncation point ni where Di(ni)+λRi(ni) is minimized with respect to the code block Bi will be described below.
FIG. 4 is a flowchart for explaining a sequence for the processing of determining the code truncation point ni with respect to the code block Bi whose effective coding pass count is ki_max. As shown in FIG. 4, first of all, the code truncation point ni is initialized to 0 (step S401). A variable k representing a code truncatable point of interest is set to 1 (step S402). With regard to the code truncatable point k of interest, a code amount increase ΔRi(k) and distortion index value decrease ΔDi(k) are obtained when the code truncation point of the code block Bi is moved from ni to k (step S403).
As the code amount of codes between the code truncation point ni and the code truncatable point k of interest versus a distortion index value, ΔDi(k)/ΔRi(k) is calculated and compared with 1/λ (step S404). If ΔDi(k)/ΔRi(k) is larger than 1/λ (Yes), the value of ni is updated to k (step S405). Subsequently, k is incremented by one to lower the code truncation point of interest by one (step S406). If it is determined in step S404 that ΔDi(k)/ΔRi(k) is equal to or smaller than 1/λ (No), the flow advance to step S406 without updating ni. The updated value of k is then compared with the coding pass count ki_max of this code block (step S407).
If k≦ki_max (No), the processing from step S403 is repeatedly performed for the updated value of k. If k>ki_max (Yes), the processing is terminated, and the code truncation point of the code block Bi of interest with provided λ is set to ni at the end time.
Considering that the above algorithm is executed for various values of λ, the efficiency can be improved by setting code truncation point candidates for a code block in advance. Code truncation for a code block is performed on a coding pass basis. Basically, therefore, code truncation can be done at all coding pass boundaries. When, however, the above code truncation point ni determination algorithm is to be used, truncation candidate points are determined such that Si(k)=ΔDi(k)/ΔRi(k) representing a rate/distortion gradient between the code truncation points monotonously reduces in accordance with k, and no coding pass boundary that does not satisfy the condition is selected as a code truncation point.
Consider, for example, a code block coded by four coding passes as shown in FIG. 10. FIG. 10 is a graph showing an example of the relationship between the rate of each pass and distortion of a code block. Basically, code truncation can be done at four pass boundaries indicated by code truncatable points 0 to 4 in FIG. 10. At truncatable point 2, the rate/distortion gradient does not monotonously reduce, and hence it is not efficient to truncate a code of the code block. According to the above algorithm, therefore, this point is not selected as a code truncation point.
Algorithm for selecting code truncation candidate points from the boundaries between all coding passes will be described below. FIG. 5 is a flowchart for explaining the flow of the processing of selecting code truncation candidate points. In this case, a set of code truncation candidate points is represented by Ni.
First of all, as the initial state of the set Ni of code truncation candidate points, a set of boundaries between all the coding passes of a code block of interest is obtained (step S501). If, for example, the coding pass count of the code block Bi is represented by ki_max, Ni={1, 2, 3, . . . , ki_max}. A code truncation point p as a candidate determination target is set to 0 (step S502). In addition, as the next code truncation candidate point as a candidate determination target, k is set to 1 (step S503).
It is then checked whether the set value k belongs to the set Ni (step S504). If k belongs to Ni (Yes), the code amount increase ΔRi(k) and distortion index value decrease ΔDi(k) in a case wherein the truncation candidate point is moved from p to k are obtained, together with the rate/distortion gradient Si(k) in this section (step S505). If k does not belong to Ni (No), the flow shifts to step S508 (to be described later).
After the processing in step S505, it is checked whether p≠0 and Si(k)>Si(p) (step S506). If p≠0 and Si(k)>Si(p) (Yes), p is removed from the set Ni (step S510), and the flow returns to step S502. Otherwise (e.g., p=0 and Si(k)≦Si(p)), p is set to k (step S507), and the value of k is updated by being incremented by one (step S508).
Subsequently, k is compared with ki_max (step S509). If k≦ki_max (No), processing is performed for updated k from step S504. If k>ki_max (Yes), the processing is terminated, and the set Ni is set as a set of code truncation candidate points at this point of time. For example, in the case of the code block shown in FIG. 10 described above, code truncation candidate point set Ni={1, 3, 4}. At these truncation candidate points, the rate/distortion gradient monotonously reduces in accordance with k, as shown in FIG. 11. That is, FIG. 11 shows how passes are integrated by the above monotonous reduction processing.
The values of the rate/distortion gradient Si(k) and code amount Ri(k) are held in correspondence with k belonging to the code truncation candidate point set Ni obtained in the above manner, and the maximum value k satisfying Si(k)>λ is selected. As the value of λ decreases, the code truncation point lowers, and the number of codes to be truncated decreases. In contrast to this, as the value of λ increases, the code truncation point rises, and the number of codes to be truncated increases. The multiplier λ can be regarded as an image quality parameter. A search is then made for λ satisfying total code amount R=Rmax or R≈Rmax while decreasing the value of λ. Code truncation points of each code block are determined on the basis of λ, thereby realizing rate/distortion optimization.
A case wherein the rate/distortion optimization technique is applied to JPEG2000 will be described below. Since a specific coding method by JPEG2000 has been described in detail in the recommendation, only the flow of processing in a simple case will be roughly described below.
For the sake of simplicity, a coding target image is 512×512 monochrome image data with each pixel consisting of eight bits (0 to 255). Letting x be the pixel position (coordinate) of each pixel of the coding target image in the horizontal direction, and y be the pixel position of each pixel in the vertical direction, the pixel value at a pixel position (x, y) is represented by P(x, y). As JPEG2000 coding conditions, no tiling, two times of discrete wavelet transform, the use of a 9×7 lossy filter (9-7 irreversible filter), a code block size of 64×64, and the formation of a code sequence on one layer will be described. In addition, various conditions such as an option for entropy coding are required. However, no mention will be made of such conditions, in particular.
FIG. 2 is a block diagram showing the arrangement of an image coding apparatus which performs general JPEG2000 coding. Referring to FIG. 2, reference numeral 200 denotes an image data input unit; 201, a discrete wavelet transform unit; 202, a coefficient quantization unit; 203, a code block segmenting unit; 204, a code block coding unit; 205, a code sequence forming unit; 206, a code sequence storage unit; 207, a code block information storage unit 207; and 208, a code output unit.
First of all, pixel values P(x, y) constituting coding target image data are sequentially input from the image data input unit 200. The image data input unit 200 performs DC level shifting of the input data from 0 to 255 into data P′(x, y) from −128 to 127 by subtracting the intermediate value 128 from each input pixel value P(x, y), and outputs the resultant data to the discrete wavelet transform unit 201.
The wavelet transform unit 201 stores the input data P′(x, y) after DC level shifting in an internal buffer, as needed, and executes two-dimensional discrete wavelet transform. Two-dimensional discrete wavelet transform is performed by applying one-dimensional discrete wavelet transform in the horizontal and vertical directions. The wavelet transform unit 201 uses a 9×7 lossy filter for one-dimensional wavelet transform.
FIGS. 3A to 3C are views for explaining the subbands of a coding target image to be processed by two-dimensional discrete wavelet transform. First of all, the discrete wavelet transform unit 201 applies one-dimensional discrete wavelet transform to a coding target image like the one shown in FIG. 3A in the vertical direction to decompose the image into a low-frequency subband L and high-frequency subband H. One-dimensional discrete wavelet transform is then applied to each subband in the horizontal direction to decompose the respective subbands into four subbands LL, HL, LH, and HH, as shown in FIG. 3C.
The discrete wavelet transform unit 201 repeatedly applies two-dimensional discrete wavelet transform to the subband LL obtained by the above two-dimensional discrete wavelet transform. This makes it possible to decompose the coding target image into seven subbands LL, HL1, LH1, HH1, HL2, LH2, and HH2.
FIG. 6 is a view for explaining the seven subbands obtained by performing two-dimensional discrete wavelet transform twice. As shown in FIG. 6, on the decoding side, an image can be reconstructed in ¼ size in both the horizontal and vertical directions by decoding the coefficient of the subband LL. In addition, an image can be reconstructed in ½ size in the horizontal and vertical directions by decoding the coefficients of the subbands HL1, LH1, and HH1. An image equal in size to the original image can played back by decoding the subbands HL2, LH2, and HH2. The subband LL will be referred to as resolution level 0; LH1, HL1, and HH1, resolution level 1; and LH2, HL2, and HH2, resolution level 2.
In the following description, a coefficient in each subband is represented by C(Sb, x, y) where Sb represents the type of subband, i.e., one of LL, LH1, HL1, HH1, LH2, HL2, and HH2, and (x, y) represents a coefficient position (coordinates) in the horizontal and vertical directions when the coefficient position at the upper left corner in each subband is represented by (0, 0).
The coefficient quantization unit 202 quantizes the coefficient C(Sb, x, y) of each subband, generated by the discrete wavelet transform unit 201, by using a quantization step delta(Sb) determined for each subband. Letting Q(Sb, x, y) be a quantized coefficient value, the quantization processing performed by the coefficient quantization unit 203 can be represented by:Q(Sb, x y)=sign{C(Sb, x, y)}×floor{|C(Sb, x, y)|/delta(Sb)}where sign{I} is a function representing the sign of an integer I, which returns 1 when I is positive, and −1 when I is negative, and floor{R} is the maximum integral value that does not exceed a real number R.
The code block segmenting unit 203 stores the coefficient C(Sb, x, y) of each subband, quantized by the coefficient quantization unit 202, in an internal buffer (not shown), as needed, and segments and extracts each subband into rectangles called code blocks each having a predetermined size. Code block segmentation is performed by segmenting each subband into 64×64-bit blocks with reference to the upper left corner of the subband. With this operation, each of the subbands LL, HL1, LH1, and HH1 is segmented into four code blocks, and each of the subbands HL2, LH2, and HH2 is segmented into 16 code blocks.
Note that the respective code blocks are assigned non-redundant identification numbers i (0 to 63) to be specified in the form of Bi like B0, B1, B2, . . . , B63. In addition, the identification numbers i are assigned to the code blocks in order of resolution level, assigned in order of the subbands HL, LH, and HH within the same solution level, and assigned in raster scan order within the same subband. FIG. 7 is a view showing how code block segmentation is performed by the code block segmenting unit 203. Referring to FIG. 7, the solid lines indicate the boundaries between the subbands, and the dotted lines indicate the boundaries between the code blocks. Each rectangle defined by the dotted lines or solid lines is a code block.
The code block coding unit 204 expresses the absolute value of the quantized coefficient value Q(Sb, x, y) (to be simply referred to as a “coefficient value” hereinafter) in a code block Bi extracted by the code block segmenting unit 203 in natural binary notation, performs binary arithmetic coding preferentially in the bit plane direction from the most significant bit to the least significant bit, and stores the coded data of the code block in the code sequence storage unit 206. Each bit plane is coded in three passes, except for the most significant bit plane. Note that segmentation to passes and a specific coding method in each pass should comply with the recommendation.
The code block coding unit 204 obtains the code amount increase ΔRi(k) and distortion index value decrease ΔDi(k) of a pass of interest for each pass coding operation, forms the table shown in FIG. 8, and stores it an internal buffer (not shown). FIG. 8 is a view showing an example of the information of the code block Bi formed inside the code block coding unit 204. Note that as distortion index values, mean square errors, weighted mean square errors derived by assigning a weight for each subband, or the like are used. A scheme of deriving a distortion index value decrease for each coefficient for each of the three types of passes is described in patent reference 1 (Annex J of the recommendation) or the like.
When coding of all the passes is completed for the code block Bi of interest and a table like the one shown in FIG. 8 is completed, the algorithm for selecting code truncation candidate points, shown in FIG. 5 descried above, is executed to obtain a code truncation candidate point set Ni exhibiting a monotonous reduction in Si(k) from the set of all coding pass boundaries. Subsequently, as shown in FIG. 9, the element count NP of the candidate set Ni, the pass number k at each code truncation candidate point, the rate/distortion gradient Si(k), and the code amount Ri(k) are stored in the code block information storage unit 207. That is, FIG. 9 is a view showing an example of the information of code truncation candidate points stored in the code block information storage unit 207.
When the code block coding unit 204 completes coding all the code blocks, the code sequence forming unit 205 searches for λ with which total code amount R=Rmax or R≈Rmax by referring to the truncation candidate point information of each code block which is stored in the code block information storage unit 207, forms a final code sequence by collecting codes of a portion that satisfies Si(k)>λ, and outputs it.
FIG. 14 is a flowchart for explaining the flow of the processing of determining λ in the code sequence forming unit 205. The processing of determining the threshold λ by the code sequence forming unit 205 will be described below with reference to FIG. 14. In the following description, S is introduced as a variable representing a threshold, and the value of the variable S at the end of the processing is set as λ.
First of all, the code sequence forming unit 205 obtains a minimum value Smin and maximum value Smax of Si(k) by referring to the truncation candidate point information of all the code blocks which is stored in the code block information storage unit 207 (step S1401). The variable S representing a threshold is then set to Smax obtained in step S1401 (step S1402). Subsequently, the value of the variable S is slightly decreased by subtracting a predetermined threshold change width ΔS from the variable S (step S1403).
A variable i representing a code block number is set to 0, and a cumulative code amount R is initialized to 0 (step S1404). A maximum value k satisfying si(k)>S is obtained by referring to the truncation candidate point information of the code block Bi which is stored in the code block information storage unit 207, and is set as a code truncation point ni of the code block Bi (step S1405). Since the value of Si(k) is monotonously reduced in order of the truncation candidate points of the code block Bi, ni can be obtained by sequentially comparing Si(k) in order of the candidate points.
A code amount Ri(ni) of the code block Bi at the truncation point ni obtained in step S1405 is added to the cumulative code amount R (step S1406). In addition, i is incremented by one (step S1407) and is compared with 64 (step S1408). If i is equal to 64 (Yes), the flow advances to step S1409. If i is not equal to 64 (No), the flow shifts to step S1405 to perform code amount addition with respect to the next code block.
If it is determined in step S1409 that i is equal to 64, i.e., cumulative code amounts are completely calculated from all the code blocks with the threshold S, the cumulative code amount R is compared with the target code amount Rmax. If R<Rmax (Yes), the flow advances to step S1410. Otherwise (No), the flow shifts to step S1411. If the flow shifts to step S1411, since the cumulative code amount exceeds the target code amount with the current threshold S, ΔS is added to the threshold to return it to the immediately preceding threshold S, and the processing is terminated (step S1411).
In step S1410, the threshold S is compared with the minimum value Smin obtain in step S1401. If S>Smin (Yes), the flow returns to step S1403 to slightly decrease the value of S. Thereafter, the processing up to step S1409 is performed again. If it is determined in step S1410 that S≦Smin (No), this processing is terminated. The threshold S at the end of the processing is then selected as the threshold λ.
The code sequence forming unit 205 reads out codes of a portion that satisfies Si(k)>λ from each code block from the code sequence storage unit 206 with respect to the threshold λ obtained by the above processing, and forms a JPEG2000 code sequence by adding information (main header, tile header, packet header, and the like) in accordance with the format of a JPEG2000 code sequence, and outputs the code sequence to the code output unit 208.
The code output unit 208 outputs the JPEG2000 coded data formed by the code sequence forming unit 205 to the outside of the apparatus. The code output unit 208 is implemented by a storage medium such as a hard disk, magnetooptic disk, or memory, an interface with a network, or the like.
In order to find the maximum value λ with which R≈Rmax by the above method, it is necessary to repeat the processing of obtaining the total code amounts R with various values of λ and comparing them with the target code amount Rmax. Therefore, for example, a high computation cost and long processing time are required to obtain the total code amount R because this processing is performed by, for example, accessing the memory storing the code truncation candidate point information Si(k) and Ri(k) many times and comparing si(k) with λ.
Furthermore, there is no known method of performing rate/distortion optimization processing for moving images at high speed.