1. Field of the Invention
The present invention relates to a moving image encoding apparatus and control method, and a computer program.
2. Description of the Related Art
The digitization of multimedia-related information has accelerated in recent years and has been accompanied by increasing demand for video information of higher image quality. A specific example that can be mentioned is the transition from conventional SD (Standard Definition) of 720×480 pixels to HD (High Definition) of 1920×1080 pixels in broadcast media. However, this demand for higher image quality has led simultaneously to an increase in quantity of data and, as a result, compression encoding techniques and decoding techniques that surpass conventional capabilities have been sought.
In response to such demand, the activities of the ITU-T SG16, ISO/IEC and JTC1/SC29/WH11 have moved forward the standardization of compression encoding techniques using interframe prediction, which utilizes the correlation between images. One of these techniques is H.264/MPEG-4 PART 10 (AVC) (referred to below as “H.264”), which is as an encoding scheme said to implement the most highly efficient encoding available at present. The specifications of encoding and decoding according to H.264 are disclosed in the specification of Japanese Patent Laid-Open No. 2005-167720, by way of example.
One technique newly introduced by H.264 is a technique whereby a reference image used in interframe prediction encoding is selected from among a plurality of images (this shall be referred to as “multiple reference interframe prediction” below).
According to such conventional encoding schemes as MPEG-1, MPEG-2 and MPEG-4 (referred to simply as “MPEG encoding schemes” below), forward-prediction and backward-prediction functions are available in instances where motion prediction is carried out. Forward prediction is a prediction scheme in which an image frame situated later terms of time is predicted from an image frame situated earlier in terms of time. Backward prediction is a prediction scheme in which an image frame situated earlier terms of time is predicted from an image frame situated later in terms of time. For example, with backward prediction, an image frame that skipped encoding earlier can be predicted based upon the present image frame.
According to forward prediction and backward prediction in this MPEG encoding, often an image immediately before or after an image to undergo processing is used as a reference frame to which reference is made when motion prediction is performed. The reason for this is that in many cases there is a high degree of correlation between the image to be processed and an image that is nearby in terms of time.
With an MPEG encoding scheme, however, there can be occasions where there is a large change between images, as when camera motion such as panning and tilting in shooting moving images is fast or in the case of an image immediately after a cut change. In such cases the correlation between images is small, even with images close together temporally, and there is the possibility that the advantage of motion-compensated prediction cannot be exploited.
One approach that solves this problem is multiple reference interframe prediction employed in H.264. With this prediction scheme, not only a temporally close image but also a temporally distant image is used in a prediction. If it is likely to improve encoding efficiency over that obtained with use of a nearby image, a distant image is utilized as the reference frame.
Thus, with H.264, motion-compensated prediction can be performed by selecting, from a plurality of images, an image for which the error between an input image and an image already encoded is smallest and utilizing the selected image as the reference frame. As a result, when a moving image is subjected to compression encoding, it is possible to achieve efficient encoding even if the motion of a camera shooting a moving picture image is fast or even in a case where a cut change has occurred.
However, if computations for selecting frames for which the error with respect to an input image is small are performed with regard to all images already encoded, the amount of computation increases in proportion to the number of frames referred to and the time required for encoding becomes enormous. Further, in the case of a mobile device such as a video camcorder, an increase in computation load leads to an increase in amount of battery consumption. Consequently, the effect upon available shooting time cannot be ignored.