The present invention relates to a method and apparatus for detecting motion and, more particularly, to a method and apparatus for detecting motion of a picture occurring among different frames of image, i.e., occurring with a lapse of time. Especially, the invention relates to a method and apparatus which is for use with a video encoder for encoding motion images to make motion compensation.
ITU-T (International Telecommunication Union Tele-communication Standardization Sector) recommends H.261 and H.262 as methods for encoding, storing, and transmitting image signals. ISO (International Organization for Standardization) recommends MPEG-1(11172-2) and MPEG-2 (13818-2). These methods adopt interframe prediction for motion compensation in encoding image signals. It is known that video signals produce highly redundant signals. Motion compensation interframe prediction is a technique for removing such redundancy. In the technique of motion compensation interframe prediction, each image frame to be compressed is divided into plural blocks (hereinafter referred to as target compressing blocks) that will be compressed. The target compressing blocks to be compressed are handled separately. In particular, an image frame processed prior to one target compressing block to be compressed is referred to as a reference image frame. A search area in which motion is detected is set close to the target compressing block within the reference image frame. Plural blocks (hereinafter referred to as reference image blocks) having the same size as the target compressing blocks are extracted from the search area. These reference image blocks are searched for a reference image block closest to the target compressing blocks.
Then, the difference between the target compressing block and the found reference image block is encoded, thus producing a compressed code sequence about the target compressing block. Motion detection is processing to search for this closest reference image block. A motion vector is a displacement on the viewing screen between each target compressing block and each corresponding reference image block found. Usually, such motion detection is performed by a video encoder.
Motion detection is introduced by various papers, literature, patent specifications, and so on. Most common procedure starts with calculating the absolute values of the differences in pixel between the target compressing block (also referred to as the target block) and every reference image block within the reference image frame. The sum of the absolute values of the differences is calculated for each reference image block. A reference image block providing the least sum is found. The displacement on the viewing screen between the detected reference image block and the target block is expressed in terms of horizontal and vertical displacement components, thus producing a motion vector. This procedure is known as the full-search procedure.
In the full-search procedure, the absolute values of the differences between all pixels contained in the target block and all pixels contained in all reference blocks within a reference image frame are calculated. The sum of the absolute values of the differences needs to be calculated for each reference block. Therefore, the amount of calculation is exorbitant. Hence, high-speed computational speed is necessary.
A sub-sampling procedure as shown in FIGS. 14-16 is considered as a method of detecting motion with a reduced amount of calculation. In this method, a target block 100 whose motion is to be detected within an encoded image frame is sub-sampled both horizontally and vertically at intervals of one pixel. Thus, a sub-sampled target block 101 is obtained. A search area 201 in which motion is detected is set within a reference image frame 200 shown in FIG. 15. The differences between the pixels of all the reference blocks in the search area 201 within the frame 200 and sample pixels are taken. Using these differences, the degrees of approximation of images are found. A reference image block having the highest degree of approximation is found. The displacement on the viewing screen between this reference image block and the target block is represented as a motion vector.
A multi-stage search method for finding a motion vector is also conceivable. In particular, the sub-sampling procedure is effected not for all reference image blocks within a reference image frame. Reference image frames are taken horizontally and vertically as image frames at intervals of 2 pixels. That is, the number of reference image blocks within each reference image frame is reduced to one-fourth. The degrees of similarity of all the reference blocks are found. A reference image block having the highest degree of similarity is detected. The displacement on the viewing screen between the detected reference image block and the target block is represented by horizontal and vertical displacement components, thus producing a motion vector.
The aforementioned full-search method needs a large amount of calculation and so it is difficult to widen the search or to detect motion on a real-time basis. In the sub-sampling procedure, the number of pixels within one motion detection block is reduced to one-fourth, for example, and therefore, the amount of calculation can be reduced approximately to one-fourth. However, if one takes notice of reading of a reference image block within a search area, it is necessary to read out every pixel within the search area. Where a memory is attached to the outside of a motion-detecting LSI or processor and reference image frames are stored in this memory, if the memory has a limited bandwidth (transfer efficiency), a great problem takes place. That is, every pixel within the search area is read out. Consequently, a large amount of data needs to be transferred between the memory and the motion-detecting LSI.
In the full-search method, the amount of data transferred between the motion-detecting LSI and the memory is reduced. However, it is necessary to control the motion detection operation in stepwise fashion. This complicates the circuitry and control operation. As a result, a long time is required to detect motion.
It is a first object of the present invention to provide a method and apparatus for detecting motion at an improved rate while preventing deterioration of the motion detection efficiency.
It is a second object of the invention to provide a method and apparatus for transmitting data between a memory and a motion-detecting portion at an improved efficiency by reducing the amount of data read from a search area within a reference image frame.
Other objects and features of the invention will be understood from the following description and accompanying drawings.
A motion detection method for achieving the above-described objects of the present invention starts with dividing a target block into plural blocks(hereinafter referred to as divided target blocks) at different pixel positions. Each of the divided target blocks is allocated for one of motion vectors representing motions of the target block. A reference image block is extracted from a reference area within a reference image frame. Pixel data are extracted from each divided target block. The degrees of similarity of the divided target blocks to the reference image block is obtained by calculating the degrees of similarity between each of the divided target blocks and one reference image block which is commonly used for these divided target blocks. A divided target block having the highest degree of similarity is found. The vector indicated by allocated for the block having the highest degree of similarity is detected as an optimum motion vector.
The principle and the operation of the present invention are described by referring to FIGS. 4-6. As shown in FIG. 4, a target block 100 which is within an encoded image frame and whose motion should be detected is divided into four (4) divided target block units 101-104 that are in different pixel positions and indicated by white round circles, white triangles, black triangles, and black circles, respectively. Specifically, the divided target block unit 101 includes 4 pixels starting from the left upper corner of the target block, the 4 pixels being spaced from each other by 2-pixel displacement positions horizontally and spaced from each other by 2-line displacement positions vertically. Similarly, the divided target block unit 102 includes 4 pixels starting from the second pixel as counted from the pixel at the left upper corner, the 4 pixels being spaced from each other by 2-pixel displacement positions horizontally and spaced from each other by 2-line displacement positions vertically. The divided target block unit 103 includes 4 pixels starting from the first pixel on the second line as counted from the pixel at the left upper corner, the 4 pixels being spaced from each other by 2-pixel displacement positions horizontally and spaced from each other by 2-line displacement positions vertically. The divided target block unit 104 includes 4 pixels starting from the horizontally second pixel on the second vertical line as counted from the pixel at the left upper corner, the 4 pixels being spaced from each other by 2-pixel displacement positions horizontally and spaced from each other by 2-pixel displacement positions vertically.
Each of the divided target block units 101-104 is compared with the corresponding reference block extracted from the reference area in the reference image. As shown in FIG. 5(a), a search area 201 in which motion is detected is established within a reference image frame 200, and pixels are present in the search area 201. As shown in a reference image block 202 of FIG. 5(b), a pixel position in a reference frame that is identical in relative position with the position of the pixel at the left upper corner in a target block within the present image frame is taken as the origin. It is assumed that motion vectors are detected within horizontal positions of xe2x88x923 to +2 pixel displacement positions (in the X-direction) and within vertical positions of xe2x88x923 to +2 pixel displacement positions (in the Y-direction). In the figure, the senses of the arrows indicate the positive (+) direction.
Those pixels which are within the reference image frame 200 and necessary for motion detection are pixels in xe2x88x923 to +5 pixel displacement positions in the X-direction and on xe2x88x923 to +5 lines in the Y-direction as indicated by the search area 201. The reference image block 203 is appropriately extracted from this area. The target area 202 is obtained by causing the target area 201 to move 2-pixel displacement position horizontally and 2-line displacement position vertically. The origin of the reference image block 203 is at the left upper corner. The reference image block 203 is one block within the target area 202. There are 9 reference image blocks within the target area 202, and each reference image block is a square block extending over 2-pixel displacement positions horizontally and 2-pixel displacement positions vertically. These pixels are used for calculations of approximations. These reference image blocks 203, 204, 205, 206, and so on are read out in turn.
Signals derived from the 4 divided target block units 101-104 are compared in turn with signals derived from the reference image blocks 203, 204, 205, 206, etc. read out in turn as mentioned previously.
Signals produced from the detection block units 101-104 are compared with signals derived from one reference image block in the manner described below. One example of processing for comparing the reference image block 203 with the block units 101-104 is illustrated in FIG. 6. As shown in this figure, each one of the block units 101-104 is composed of pixels existing at different positions within the target block 100. Therefore, if they are compared with the same reference image block 203, they simultaneously produce data about degrees of similarity of four vectors (0,0), (xe2x88x921,0), (0, xe2x88x921), and (xe2x88x921,xe2x88x921). In this case, the block unit 101 corresponds to the vector (0,0). The block 102 corresponds to the vector (xe2x88x921,0). The block 103 corresponds to the vector (0, xe2x88x921). The block 104 corresponds to the vector (xe2x88x921,xe2x88x921). Similarly, data about degrees of similarity to the reference image blocks 204, 205, 206, and so forth are derived.