A sub-band signal processing for dividing a very fine video signal into two or more frequency bands and performing hierarchical encoding is widely proposed. This hierarchically encodes, for example, a high-definition (HD) image and an image corresponding to a standard definition (SD) whose resolution is sampled down in half.
FIG. 1 explains the sub-band division using Z transform. A decomposition filter 170 includes a low-frequency decomposition filter 110, a high-frequency decomposition filter 120 and down-samplers 112 and 122 for sampling resolution down in half. The low-frequency decomposition filter 110 and the high-frequency decomposition filter 120 can be expressed as A(Z) and B(Z), respectively, using Z transform.
The decomposition filter 170 divides a 2N-point input signal X(Z) 100 into an N-point low-frequency signal L(Z) 104 and an N-point high-frequency signal H(Z) 105.
The composition filter 180 includes up-samplers 113 and 123 for 1:2 sampling up, a low-frequency composition filter 130 and a high-frequency composition filter 140. The low-frequency composition filter 130 and the high-frequency composition filter 140 can be expressed as P(Z) and Q(Z), respectively, using Z transform.
1:2 up-sampling is performed by inserting 0 in the N-point low-frequency signal L(Z) 104 and the N-point high-frequency signal H(Z) 105, the output of the low-frequency composition filter 130 and output of the high-frequency composition filter 140 are added and a 2N-point composition signal Y(Z) 101 is obtained. In this case, the input signal X(Z) and the composition signal Y(Z) are completely matched except fixed delay by using a filter meeting perfect reconstruction filter bank conditions.
In order to meet the perfect reconstruction filter bank conditions, it is necessary to meet the following Equations 1 and 2.P(Z)·A(Z)+Q(Z)·B(Z)=2·Z−L  (Equation 1)P(Z)·A(−Z)+Q(−Z)·B(−Z)=0  (Equation 2)If the coefficient of each filter is a finite tap length and is only a real number, the following condition can be led from the above-described conditions.P(Z)·A(Z)+P(−Z)·A(−Z)=2·−Z−L  (Equation 3)B(Z)=C·P(−Z)  (Equation 4)Q(Z)=−(1/C)·A(−Z)  (Equation 5)
In the above-described equations, C is an arbitrary constant and L is an appropriate number of fixed delay, respectively. A perfect reconstruction filter bank can be specified by assigning either the A(Z) or Q(Z) and either the P(Z) or B(Z) according to the Equations 4 and 5.
Conventionally, as one method to meet the above-described conditions, a sub-band filter, such as a symmetric short kernel filter (SSKF), a Daubechies 9/7 tap and the like, are known and various sub-band encoding methods are adopted by JPEG-2000 and the like. Conventionally, for a perfect reconstruction filter bank used for the band division of image encoding, one in which each of the above-described A(Z), B(Z) and Q(Z) has a linear phase is used.
In this case, there are two types of a linear phase filter as follows; the number of taps is odd or even. Firstly, if the number of taps of a filter is 2N+1 (odd) assuming that N is a natural number, a filter H(Z) with (2N+1) taps can be expressed as follows, using Z transform.H(Z)=Σh(k)·Z−k 
The coefficient h(k) of the above-described equation has the following nature.h(k)=h(2N−k) (k=0˜N−1)  (Equation 6)
FIG. 2 illustrates the pixel position before/after the filtering process of the odd tap filter and the even tap filter.
As illustrated in FIG. 2A, the pixel position 210 after the filtering process of the odd tap filter is disposed in the same position as the pixel position 202 before the filtering process. Specifically, in this case, group delay is 0 pixel. As an example of a perfect reconstruction filter bank meeting this condition, there are an SSKF (3, 5) tap filter, Daubechies (9, 7) tap filter which are adopted by JPEG-2000 and the like.
If the number of taps of a filter is 2N (even) assuming that N is a natural number, the coefficient h(n) of the 2N tap filter has the following nature.h(k)=h(2N−1−k) (k=0˜N−1)  (Equation 7)
As illustrated in FIG. 2B, the pixel position 220 after the filtering process of the even tap filter is a point obtained by internally dividing adjacent pixels 211 and 212 in the ratio 1:1, specifically, the intermediate point between the pixels 211 and 212. In this case, group delay is a ½ pixel. As an example of a perfect reconstruction filter bank meeting this condition, there are an SSKF (4, 4) tap filter and the like.    Patent document 1: Japanese Patent Laid-open Publication No. H7-107445    Patent document 2: Japanese Patent Laid-open Publication No. H6-343162
FIGS. 3A and 3B illustrate brightness and chroma pixels in a 4:2:2 format, and brightness and chroma pixels in a 4:2:0 format.
As illustrated in FIG. 3A, the brightness pixel 300a and chroma pixel 301a in the top field of a 4:2:2 format are disposed in the same position and the brightness pixel 302a and chroma pixel 303a in the bottom field are also disposed in the same position.
As illustrated in FIG. 3B, the chroma pixel 305a in the top field of a 4:2:0 format is disposed in a position obtained by internally dividing the brightness pixels 304a and 304b in the ratio 1:3 downward in the vertical direction. The chroma pixel 307a in the bottom field is disposed in a position obtained by internally dividing the brightness pixels 306a and 306b in the ratio 3:1 downward.
However, when dividing the top and bottom fields into a low-frequency signal L(Z) and a high-frequency signal H(Z) in the ratio 2:1 using the same filter bank in order to sub band-divide the vertical component of an interlace image, the interlace scanning line position structure of the low-frequency signal becomes incomplete.
That is, distance of the bottom field pixels from the top field pixels, or, distance of the top field pixels from the bottom field pixels is not equal. Its details are described in D-334 “Problems and Countermeasures of Current TV/HDTV Compatible Encoding” of the proceedings of 1992 IEICE Spring Conference.
As its countermeasures, conventionally, several solutions are proposed. In Patent document 1, it is described that the position deviation of a pixel after the sub-band division can be eliminated by dividing the band into three sub-bands.
In Patent document 2, by applying sub-band division into a low-frequency signal and a high-frequency signal by a perfect reconstruction filter bank whose number of taps is even to a pixel in the top field and applying sub-band division into a low-frequency signal and a high-frequency signal by a perfect reconstruction filter bank whose number of taps is odd to a pixel in the bottom field, the scanning line position structure of a top field and a bottom field can be maintained. Thus, the distance of the bottom field pixels from the top field pixels and the distance of the top field pixels from the bottom field pixels equal.
Next, pixel positions at the time of down-sampling in the case where sub-band division is performed using an odd tap filter and an even tap filter are explained with reference to FIGS. 4A and 4B.
If even tap filtering explained in FIG. 2B is applied to the top field, as to brightness pixels, brightness pixels 402a and 402b after down-sampling are generated at points obtained by internally dividing each of the brightness pixels 400a and 400b of the original resolution in the ratio 1:1, and brightness pixels 400c and 400d in the ratio 1:1. Similarly, as to chroma pixels, a chroma pixel 403 after down-sampling is generated at a point obtained by internally dividing the chroma pixels 401a and 401b in the ratio 1:1.
If odd tap filtering explained in FIG. 2A is applied to the bottom field, as to brightness pixels, brightness pixels 406a and 406b after down-sampling are generated in the same positions as the brightness pixels 404b and 404d of the original resolution. Similarly, as to chroma pixels, a chroma pixel 407 after down-sampling is generated in the same position as the chroma pixel 405b of the original resolution.
However, the method described in Patent document 2 cannot meet the disposition conditions of brightness and chroma pixels in the vertical direction of a 4:2:0 interlace format as illustrated in FIG. 3B, which are specified by video encoding standards, such as the H.264/MPEG-4 Part 10 of International Telecommunications Union Telecommunications Standardization Sector (ITU-T) and the like.
Specifically, if sub-band division is applied to an image in the top field of the 4:2:0 format, using an even tap filter, as illustrated in FIG. 4A, the chroma pixel 403 after down-sampling is generated in a position obtained by internally dividing the brightness pixels 402a and 402b after down-sampling into 3:5 downward. This does not coincide with the position obtained by internally dividing brightness pixels in the ratio 1:2 being the home position of a chroma pixel in the top field of the 4:2:0 format illustrated in FIG. 3B.
If sub-band division is applied to in image in the bottom field using an odd tap filter, as illustrated in FIG. 4B, the chroma pixel 407 after down-sampling is generated in a position obtained by internally dividing the brightness pixels 406a and 406b after down-sampling them in the ratio 7:1 downward. This does not coincide with the position obtained by internally dividing brightness pixels in the ratio 3:1 being the home position of a chroma pixel in the bottom field of the 4:2:0 format illustrated in FIG. 3B.
Furthermore, since in the above-described method it is necessary to use different filters between the top and bottom fields, the frequency characteristics of the amplitude of the top and bottom field after filtering process cannot be completely matched.