In recent years, apparatuses in conformity with the system such as MPEG in which image information is handled as digital information to compress, in that instance, image information by orthogonal transform such as discrete cosine transform, etc. and motion compensation by making use of redundancy specific to image information for the purpose of performing transmission and/or storage of efficient information are being popularized in both information distribution (delivery) at broadcast station, etc. and information reception in general homes.
Particularly, MPEG 2 (ISO/IEC 13818-2) is defined as general purpose image encoding system, and is widely used at present for broad applications of professional use purpose and consumer use purpose at the standard where both interlaced scanning image and sequential scanning image, and standard resolution image and high definition image are covered or included. By using the MPEG 2 compression system, e.g., in the case of interlaced scanning image of standard resolution having 720×480 pixels, code quantity (bit rate) of 4 to 8 M bps is assigned, and in the case of interlaced scanning image of high resolution having 1920×1088 pixels, code quantity (bit rate) of 18 to 22 M bps is assigned so that high compression factor and satisfactory picture quality can be realized.
The MPEG 2 was mainly directed to high picture quality encoding adapted for broadcast, but did not comply with code quantity (bit rate) lower than the MPEG 1, i.e., encoding system of higher compression factor. However, it is expected that need of such encoding system will be increased in future by popularization of portable terminals. In correspondence therewith, standardization of the MPEG 4 encoding system was performed. In regard to the image encoding system, the standard thereof was approved for the International standard as ISO/IEC 14496-2 in December, 1998.
Further, in recent years, for the first object of image encoding for television conference, the standardization of H.26L (ITU-T Q6/16 VCEG) is being developed. At H.26L, it is known that while a larger number of operation quantities are required in encoding/decoding therefor as compared to the conventional encoding system of the MPEG 2 or MPEG 4, higher encoding efficiency can be realized. Moreover, at present, as a part of activity of MPEG 4, standardization in which functions which cannot be supported by the H.26L are also taken in with such H.26L being as base is being performed as Joint Model of Enhanced-Compression Video Coding.
Meanwhile, in the H.26L, as one of element technology for realizing high encoding efficiency, motion prediction/compensation based on variable block is mentioned. Under existing circumstances, seven kinds of prediction/compensation block sizes as shown in FIG. 1 are determined.
Moreover, in the H.26L, motion prediction/compensation processing of high accuracy such as ¼ pixel accuracy or ⅛ pixel accuracy are prescribed. In the following description, motion prediction/compensation processing will be first described.
The motion prediction/compensation processing of ¼ pixel accuracy determined in the H.26L is shown in FIG. 2. In generating predictive picture of ¼ pixel accuracy, FIR filters respectively having 6 taps in horizontal and vertical directions are first used to generate pixel values of ½ pixel accuracy on the basis of pixel values stored in the frame memory. Here, as coefficients of the FIR filter, coefficients indicated by the following formula (1) are determined.{1,−5,20,20,−5,1}/32  (1)
Further, predictive picture of ¼ pixel accuracy is generated by linear interpolation on the basis of the generated predictive picture of ½ pixel accuracy.
Further, at the H.26L, for the purpose of performing motion prediction/compensation of ⅛ pixel accuracy, filter banks shown in the following formula (2) are prescribed.1:1⅛: {−3,12,−37,485,71,−21,6,−1}/512 2/8: {−3,12,−37,229,71,−21,6,−1}/256⅜: {−6,24,−76,387,229,−60,18,−4}/512 4/8: {−3,12,−39,158,158,−39,12,−3}/256⅝: {−4,18,−60,229,387,−76,24,−6}/512 6/8: {−1,6,−21,71,229,−37,12,−3}/256⅞: {−1,6,−21,71,485,−37,12,−3}/512  (2)
It is to be noted that, in the image compressed information, accuracy of motion vector is prescribed by MotionRelation field in RTP (Real-time Transfer Protocol).
As stated above, in the existing H.26L, motion prediction/compensation processing using a filter determined in advance as shown in the formula (1) or (2) is prescribed. In addition, as described in “Adaptive Interpolation Filter for Motion Compensated Hybrid Video Coding” T. Wedi, Picture Coding Symposium 2001, pp 49-52 (hereinafter referred to as literature 1), it is also being considered at present to use adaptive filter corresponding to input image is used.
In concrete terms, in the literature 1, adaptive optimization for motion prediction compensation processing as described below is proposed. Namely, initially, as the first step, a filter determined in advance is used to determine motion vector d(k) which minimizes predictive error. Subsequently, as the second step, filter coefficients H(k) such that predictive error is minimized with respect to the motion vector d(k) determined at the first step are determined. By the filter coefficients H(k) and the motion vector d(k) which have been determined in this way, motion compensation processing is performed. In accordance with the literature 1, in the simulation experiment using test sequence “Mobile2 and “Foreman” of CIF size, encoding gain of the order of 1.0 to 1.5 dB can be obtained by the above-mentioned technique as compared to the case where filter determined in advance is used.
Here, in the H.26L, similarly to the MPEG 2, prescription relating to B picture is included. A method for bi-directional prediction using B picture in the H.26L is shown in FIG. 3. As shown in FIG. 3, B2 picture and B3 picture use I1 picture and P4 picture as reference picture, and B5 picture and B6 picture use P4 picture and P7 picture as reference picture.
Moreover, in the image compressed information, uses of respective pictures are prescribed as shown in FIG. 4 by PTYPE in the picture header. As shown in FIG. 4, when value of Code number is 0 or 1, use of P picture is designated. When value of Code number is 2, use of I picture is designated. When value of Code number is 3 or 4, use of B picture is designated. In this instance, when value of Code number is 0, only picture immediately before is used for prediction, whereas when value of Code number is 1, plural past pictures are used for prediction. Further, when value of Code number is 3, pictures immediately before and immediately after are used for prediction, whereas when value of Code number is 4, plural past and future pictures are used for prediction. As stated above, similarly to the P picture, also in the B picture, multiple frame prediction can be used.
Further, in the H.26L, B picture is used to thereby permit realization of time scalability. Namely, since there is no possibility that B picture is used as reference range, B picture can be annulled without performing its decoding processing.
Furthermore, in the B picture, five kinds of predictive modes of direct predictive mode, Forward predictive mode, Backward predictive mode, Bi-directional predictive mode and intra predictive mode are prescribed. It is to be noted that while the direct predictive mode and the bi-directional predictive mode are both bi-directional prediction, difference therebetween is that different motion vector information are used in the forward direction and in the backward direction in the bi-directional predictive mode, whereas motion vector information of the direct predictive mode is read out from corresponding macro block in the future predictive frame.
Macro block type (MB_Type) with respect to B picture prescribed in the H.26L is shown in FIG. 5. Here, in FIG. 5, Forward of columns of respective Prediction Types corresponding to Code_number indicates type of forward direction, Backward thereof indicates type of backward direction, Bi-directional thereof indicates type of bi-direction, and intra thereof indicates type within picture (frame), and the description such as “16×16” succeeding thereto indicates size of prediction block as shown in FIG. 1. Moreover, information to which “X” is attached of respective columns of intra_pred_mode, Ref_frame, Blk_size, MVDFW and MVDBW are defined with respect to corresponding Prediction Types. For example, MVDFW and MVDBW respectively indicate forward motion vector information and backward motion vector information. In addition, with respect to information of field block size Blk_size in the Bi-directional mode, the relationship between Code_number and Block Size as shown in FIG. 6 is prescribed.
However, in a manner as shown in FIG. 3, in the B picture, bi-directional prediction is used to thereby realize higher encoding efficiency as compared to I/P pictures, but a larger number of operation quantities and memory accesses are required as compared to the I/P pictures.
Particularly, in the case where the H.26L system is used, since interpolation processing using filter of 6 taps or 8 taps as indicated by the formula (1) or (2) is performed in prediction/compensation processing, there was the problem that its operation quantity and the number of memory accesses becomes vast as compared to the case where the MPEG 2 system is used.