Encoding methods such as JPEG, MPEG1 and MPEG2 have been established as techniques for high-efficiency encoding of images. Manufactures have developed and commercialized shooting apparatuses such as a digital camera and a digital video camera or DVD recorders in which images can be recorded using these encoding methods. Users can easily view images through these apparatuses, personal computers, or DVD players.
Further, encoding methods for moving images have been studied to obtain a higher compression than MPEG1 and MPEG2. In recent years, an encoding method called H.264/MPEG-4 part 10 (hereinafter referred to as H.264) has been standardized by International Telecommunication Union-Telecommunication Standardization Sector (ITU-T) and International Organization for Standardization (ISO).
Referring to the block diagram of FIG. 11, the following will describe a typical overall configuration of a moving image compression encoding apparatus in H.264. The moving image compression encoding apparatus comprises a camera unit 200, a subtraction unit 2001, an integer transformation unit 2002, a quantization unit 2003, an entropy encoder 2004, an invert-quantization unit 2005, an invert integer transformation unit 2006, an adder 2007, frame memories 2008 and 2012, an intra prediction unit 2009, switches 2010 and 2015, a de-blocking filter 2011, an inter prediction unit 2013, and a motion detector 2014. Image data inputted from the camera unit 200 is divided to form blocks, encoding processing is performed on each block, and then encoded data is outputted. The following will discuss encoding processing of H.264.
First, the subtraction unit 2001 subtracts prediction image data from image data having been inputted from the camera unit and outputs differential image data. The generation of prediction image data will be discussed later. The integer transformation unit 2002 performs orthogonal transformation on the differential image data having been outputted from the subtraction unit 2001 according to DCT or the like and outputs a transformation coefficient. Then, the quantization unit 2003 quantizes the transformation coefficient by using a predetermined quantization parameter. The entropy encoder 2004 is fed with the transformation coefficient having been quantized by the quantization unit 2003, performs entropy coding on the transformation coefficient, and outputs the coefficient as encoded data.
On the other hand, the transformation coefficient having been quantized by the quantization unit 2003 is also used for generating prediction image data. The invert-quantization unit 2005 invert quantizes the transformation coefficient having been quantized by the quantization unit 2003. Further, the invert integer transformation unit 2006 performs invert integer transformation according to inverse DCT transformation or the like on the transformation coefficient having been invert quantized by the invert-quantization unit 2005, and outputs the coefficient as decoded differential image data. The adder 2007 adds the decoded differential image data and the prediction image data and outputs the data as reconstruction image data.
The reconstruction image data is recorded in the frame memory 2008. When de-blocking filter processing is performed, the reconstruction image data is recorded in the frame memory 2012 through the de-blocking filter 2011. When de-blocking filter processing is not performed, the reconstruction image data is recorded in the frame memory 2012 without passing through the de-blocking blocking filter 2011. The switch 2010 is a selection unit for selecting whether to perform de-blocking filter processing. In the reconstruction image data, data which may be referred in the second prediction and later is stored as reference frame data in the frame memory 2008 or 2012 for a while. The de-blocking filter 2011 is used for removing noise.
The intra prediction unit 2009 performs intra-frame prediction using the image data recorded in the frame memory 2008 and generates prediction image data. The inter prediction unit 2013 performs inter-frame prediction using the reference frame data recorded in the frame memory 2012, based on motion vector information detected by the motion detector 2014, and generates prediction image data. The motion detector 2014 detects a motion vector in inputted image data and outputs information relating to the detected motion vector to the inter prediction unit 2013 and the entropy encoder 2004. The switch 2015 is a selection unit for selecting whether to use intra prediction or inter prediction. The switch 2015 selects one of outputs from the intra prediction unit 2009 and the inter prediction unit 2013 and outputs the selected prediction image data to the subtraction unit 2001 and the adder 2007. The above explanation described the image compression encoding apparatus shown in FIG. 11.
The following will discuss the operations of the motion detector 2014 in H.264. In H.264, as shown in FIG. 12, a reference frame with high encoding efficiency is selected from a plurality of reference frames (RF1 to RF5) for each macro block in a current frame (CF) and the frame to be used can be designated. In this case, two or more reference frames may be selected for a macro block in the current frame CF or a different reference frame may be selected even in the macro block of the same frame.
One macro block in an image of FIG. 13A is divided into 16×16 pixels, 16×8 pixels, 8×16 pixels, or 8 ×8 pixels as shown in FIG. 13B. A motion vector and a reference frame can be separately obtained in each macro block partition. In the case of 8×8 pixels, each macro block partition can be further divided into sub macro block partitions of 8×4 pixels, 4×8 pixels, and 4×4 pixels as shown in FIG. 13C. The accuracy of the motion vector can be also decided by 6-tap FIR filter processing with 1/4 pixel accuracy (Japanese Patent Laid-Open No. 2004-328633).
As described above, in H.264, the technique shown in FIGS. 13A to 13C is introduced: a plurality of pixel block shapes are prepared for predictive encoding and motion detection in fine pixels is executed. In this case, the finer the pixel blocks, the number of blocks for predictive encoding increases. Further, as shown in FIG. 12, in order to select a reference image frame with high encoding efficiency from a plurality of frames, it is necessary to perform inter-frame prediction on all the plurality of candidate reference frames, thereby increasing a processing load.
However, in some states of the camera unit for outputting image data to be compressed, it may not be necessary to perform motion detection using all the kinds of pixel block shapes. For example, when focus is not achieved or the camera unit pans or tilts, image data entirely blurs or an extreme motion occurs. Thus, correct motion information cannot be detected even by motion detection using fine pixel block shapes.
Similarly when an image is dark or much noise occurs in image data due to a high amplifier gain of a camera, in many cases, correct motion information cannot be detected even by motion detection using fine pixel block shapes. Immediately after the startup of the camera unit, in many cases, since the camera has an unstable angle of view or an unstable exposure level, correct motion information cannot be detected even by motion detection using fine pixel block shapes.
In a video camera system in growing demand, which records high-quality image data in a more compact form using such an encoding algorithm, unnecessary motion detection directly leads to an increase in battery consumption for driving mobile computing devices. This seriously interferes with a long shooting time. Further, when the encoding algorithm is implemented in software, processing time is unnecessarily increased.