1. Field of the Invention
The present invention relates to a motion vector detection apparatus, motion vector detection method, image encoding apparatus, image encoding method, and computer program.
2. Description of the Related Art
In recent years, digitization of multimedia-related information has been advancing rapidly, and demand for higher video image quality is increasing accordingly. As a practical example, there is currently an ongoing transition of broadcast media from conventional standard definition (SD) at 720×480 pixels to high definition (HD) at 1920×1080 pixels. However, the demand for higher image quality leads to a concomitant increase in digital data size and creates a need for compression encoding and decoding techniques exceeding the conventional capabilities.
To meet such demands, standardization of a compression encoding scheme using inter-frame prediction that exploits correlation among images has been implemented by the ITU-T SG16 and ISO/IEC JTC1/SC29/WG11.
As compression encoding schemes for moving images, standards such as MPEG-1, 2, and 4 and H.264 are available. In the compression encoding processing for a moving image, an original image (image) included in a moving image is divided into predetermined regions called blocks, and motion-compensated prediction and DCT transformation processing are applied to divided blocks as units.
In the case of motion-compensated prediction, the size of a block used as a unit of compression encoding processing is 16 pixels (horizontal)×16 lines (vertical) (such block is called a macroblock) in the MPEG1 and MPEG2 schemes. Upon applying frame prediction to one macroblock, one motion vector including two components in the horizontal and vertical directions is assigned. Upon applying field prediction, two motion vectors each including two components in the horizontal and vertical directions are assigned.
Upon applying motion-compensated prediction using macroblocks as units like in MPEG1 and MPEG2, motion vector assignment processing adopts a macroblock 601 as a processing unit, as shown in FIG. 6A. Then, all pixels in this macroblock 601 are represented by only one motion vector 602. A motion image is processed based on respective horizontal and vertical motion amounts of this motion vector 602.
The MPEG4 scheme comprises a mode (to be referred to as 8×8 mode hereinafter) that applies motion compensation to blocks (small blocks) 603 each having 8 pixels (horizontal)×8 lines (vertical) as units, as shown in FIG. 6B, in addition to motion-compensated prediction for respective macroblocks. Using this 8×8 mode, when a macroblock 604 of 16 pixels (horizontal) and 16 lines (vertical) includes a plurality of motions, motion vectors closely resembling an actual motion can be obtained as compared to the case of one motion vector assigned to the macroblock 604.
For example, upon examining a case in which a background (tower and sun) remains stationary, and a vehicle is moving to the left, as shown in FIG. 7A, the vehicle and background part have different motions. In such images, when a single macroblock 701 includes a part of the vehicle and background together, as shown in FIG. 7B, the motion prediction efficiency can be improved by assigning a motion vector 702 to the vehicle and a motion vector 703 to the background. Upon dividing into small blocks, since one macroblock includes four small blocks, the number of motion vectors is 1 per small block, and 4 per macroblock.
H.264/MPEG-4 PART10 (AVC) (to be simply referred to as H.264 hereinafter) is an encoding scheme that realizes high-efficiency encoding in the present circumstances. The encoding and decoding specifications of H.264 are disclosed, for example, in Japanese Patent Laid-Open No. 2005-167720.
Among the various techniques introduced by H.264, FIG. 2A depicts one which prepares a plurality of different pixel block partitions used in predictive encoding in order to detect a motion amount in a smaller pixel unit, and reduces the code size by selecting a partition with minimum prediction error. Such a partition is called a macroblock partition.
The macroblock partition will be described in detail below with reference to FIG. 2A. H.264 defines 16×16 pixels as a size used in MPEG2 to be a macroblock type 201 having a maximum block size. Based on this, a macroblock partition to be used in predictive encoding can be selected from a total of four different macroblock partitions including partitions 202 to 204 shown in FIG. 2A. Note that selecting a macroblock partition involves selecting a macroblock size to be used in predictive encoding from 16×16 pixels, 16×8 pixels, 8×16 pixels, and 8×8 pixels.
Furthermore, the macroblock partition 204 with the size of 8×8 pixels shown in FIG. 2A can be divided into smaller sub-macroblocks. In this case, the macroblock 20 can be divided into one of four different types of sub-macroblock partitions using sub-macroblocks having a block size of 4×4 pixels at minimum, as denoted by 205 to 208 in FIG. 2A. In this case, selecting a sub-macroblock partition involves selecting a sub-macroblock size to be used in predictive encoding from 8×8 pixels, 8×4 pixels, 4×8 pixels, and 4×4 pixels.
That is, a macroblock partition to be used in predictive encoding is selected from a total of 19 different types. Of these types, three types are the number of macroblock partitions 201 to 203 in FIG. 2A which cannot be divided into sub-macroblock partitions. The remaining 16 types are the product of the number of macroblocks (=4) which can be divided into the sub-macroblock partitions in the macroblocks 204 in FIG. 2A and the number of sub-macroblock partitions (=4) 205 to 208 in FIG. 2A.
In this way, by applying motion-compensated prediction by dividing each macroblock into smaller sub-macroblocks, motion vectors that fit actual motions can be flexibly expressed. However, since additional information such as vector information and the like is required for each divided sub-macroblock, dividing into sub-macroblocks is not always efficient for encoding. Therefore, it is necessary to encode by selecting combinations of sub-macroblocks with optimal sizes from blocks with various sizes.
In H.264, an intra-frame (intra) prediction mode is available, and the pixel values of a macroblock can be predicted from image information in a frame. This mode can perform prediction using blocks divided into a macroblock having a size of 16 pixels (horizontal)×16 lines (vertical) (block 211) and blocks each having a size of 4 pixels (horizontal)×4 lines (vertical) (blocks 212).
Furthermore, as shown in FIG. 5, H.264 can select reference frames with high encoding efficiency from a plurality of reference frames RF1 to RF5 for respective macroblocks in a frame to be encoded (CF), and can designate frames to be used for respective blocks. Hence, even macroblocks in the identical frame CF to be encoded may select different reference frames. In this manner, H.264 sets a plurality of search layers for motion vector detection using a plurality of reference frames.
As a result, motion information is searched for using a smaller image unit, thus improving the motion information precision.
However, MPEG2 has only one type of macroblock, while H.264 has 19 types of partitions. Therefore, intensive arithmetic operations are required to evaluate motion vectors for all blocks included in each partition and to select a combination of optimal block sizes from sub-macroblocks with varying sizes. For this reason, encoding apparatuses are required to have a larger hardware scale and must perform processing using high-speed clocks, which frustrates reductions in apparatus size and power consumption. Since H.264 can perform motion vector detection using a plurality of search layers (a plurality of reference frames), intensive arithmetic operations are required if all the partitions are to be evaluated for respective search layers.
In the case of mobile devices such as video camcorders and the like, an increase in arithmetic load leads to an increase in the amount of battery consumption necessary to drive the device, which results in a shorter recording time periods.