Field of the Invention
The present invention relates to a technique for encoding a moving image.
Description of the Related Art
In image capturing apparatuses such as digital cameras and digital camcorders, a CCD sensor or a CMOS sensor is adopted as an image sensor. In addition, a color filter array (hereinafter, referred to as a CFA) is provided on the surface of the sensor, and one color component is detected for one pixel. By using the CFA, image data (hereinafter, referred to as RAW image data) in a Bayer array in which R (red), G0 (green), B (blue) and G1 (green) are arranged in a cyclic pattern as shown in FIG. 2 is obtained. Due to human visual properties having a high sensitivity to luminance components, in a general Bayer array, a configuration is used in which the allocated number of pixels of a green component that includes a large amount of luminance components is twice the number of pixels of a red component and the number of pixels of a blue component as shown in FIG. 2. The RAW image data has information of only one color component per pixel. Therefore, processing for generating information of red, blue and green for one pixel is performed using processing called demosaicing. In addition, generally, image data of RGB signals obtained by performing demosaicing or YUV signals obtained by converting RGB signals is encoded, and recorded in a recording medium such as a memory card. However, image data obtained by demosaicing has three color components per pixel, and requires a data amount three times the data amount of RAW image data. Therefore, a method for directly encoding and recording RAW image data before demosaicing has been proposed.
For example, in Japanese Patent Laid-Open No. 2011-41144, a method for performing encoding after separating RAW image data into four planes, namely, R, G0 , B and G1 planes, is described. When recording a moving image, the data amount is very large, and thus there is demand for more efficiently compressing and encoding data. In view of this, a method for efficiently performing encoding using motion-compensated prediction encoding is described in Japanese Patent Laid-Open No. 2014-17647.
In the method described in Japanese Patent Laid-Open No. 2014-17647, the G component is divided into two types of frames, namely, G0 and G1 frames, and motion-compensated prediction is then performed. Therefore, the number of G frames is twice the number of R frames and the number of B frames, and the processing amount of the G frames is twice the processing amount of the R frames and the processing amount of the B frames. In addition, the G0 frame and the G1 frame are sometimes image data of the same time and sometimes not, and it is necessary to newly establish a method for decoding the G frames and the R and B frames at different timings when performing decoding with a conventional decoding apparatus that uses an encoding scheme such as MPEG, H.264 or HEVC.
Encoding by a conventional encoding apparatus that uses an encoding scheme such as MPEG, H.264 or HEVC is performed with the luminance/color difference set to 4:2:2 (referred to as a YCC 422 array).
The present inventors focused on the fact that the number of pixels in the horizontal direction in a case where G0 and G1 are arranged side-by-side in RAW image data in a Bayer array is, as shown in FIG. 3, twice the number of R pixels and the number of B pixels, and thought of a method of inputting image data with the pixels of the image data arranged in a YCC 422 array by considering G0 and G1 as Y, R as Cr, and B as Cb. The image data is in the equivalent of an YCC 422 array, and thus the numbers of R, B and G frames are the same, enabling control similar to conventional encoding and decoding apparatuses.
Here, a case is considered in which the horizontal component of a motion vector of the Y (G) component is an odd number with integer precision, and as in FIG. 4, G0 and G1 refer to each other (in FIG. 4, the coordinates of the motion vector are (−1,0)). When this motion vector is returned to a Bayer array, G0 and G1 are aligned in an oblique direction, and thus G0 will refer to G1 in the lower left direction, and G1 will refer to G0 in the upper left direction. Therefore, when this concept is applied to the R and B components, it is not clearly determined whether to refer to the pixel in the upper left direction, the lower left direction, or the vertical direction.
Therefore, assume that, with the motion vectors of R and B, the horizontal component is half that of G, and the vertical component is the same as that of G. In this case, if the pixel in the lower left direction needed to be referred to or the pixel in the upper left direction needed to be referred to with the R and B components, the image that needed to be referred to with the G component cannot be referred to, and the encoding efficiency deteriorates.