Motion estimation is a widely used method for image processing, such as for the prediction or interpolation of moving objects in a predicted or interpolated image. To compute each part of the image, occlusion problem has to be resolved. Besides detecting the occlusion area(s) of the image, the location of the foreground or background area(s) also need be identified so that the reference used to predict the occlusion areas can be determined.
Occlusion areas are parts of scene which are visible in one frame and become invisible in a neighboring frame of multiple consecutive images due to blockage by foreground object(s). Each of the consecutive images can be a picture captured by imaging system or an interpolated image based on captured pictures. For consecutive images captured by a camera, the presence of the occlusion areas is caused by the changing of relative position between the objects and the camera. When parts of the image become invisible, at least a portion of a background object is covered by a foreground object located closer to the camera due to the projection. Vise verse, the foreground object is located more far away from the camera in the scene and parts of the background become uncovered. The parts of the background object which become covered or uncovered are referred as occlusion areas. When the areas become covered or uncovered, the uniquely matched areas in neighboring images cannot be found by motion estimation. Therefore, occlusion problem has to be taken special care, otherwise artifacts may occur in temporal interpolation.
In a conventional method, occlusion detection is based on pixel differences associated with two motion vectors between two consecutive images (U.S. patent Ser. No. 7,995,793). One of the two motion vectors may be zero, which corresponds to a background that is stationary relative to the camera. FIG. 1 illustrates an example of occlusion detection based on two motion vectors between two consecutive frames, Frame (t) and Frame (t+1) captured by an imaging system. Frame (t+δ) is to be temporally interpolated based on Frame (t) and Frame (t+1). Two motion vectors (i.e., MV1 and MV2) are determined to describe the motion between the two frames, where MV1=0 corresponds to zero motion for the background areas.
Frame (t+δ) is formed by projecting Frame (t) to Frame (t+1) according to one of the two motion vectors. For example, area 111 in Frame (t+δ) is formed by projecting area 101 in Frame (t) to area 121 in Frame (t+1) according to motion vector MV1 since good match between area 101 and area 121 can be found using motion vector MV1. Similarly, area 112 in Frame (t+δ) is formed by projecting area 102 in Frame (t) to area 122 in Frame (t+1) according to motion vector MV2 since good match can be found between area 102 and 122 using motion vector MV2. Nevertheless, for area 113 in Frame (t+δ), none of the motion vectors can result in good match between corresponding areas in Frame (t) and Frame (t+1). In other words, the pixel differences are large for corresponding areas in Frame (t) and Frame (t+1) associated with area 113 in Frame (t+δ) regardless which of the two motion vectors is used.
Besides detecting the location of an occlusion area, the reference in the neighboring image should be determined for the prediction of the occlusion area. As the image content in the occlusion area is from the corresponding area to be covered or becoming uncovered, the location of the corresponding area should be identified. The corresponding area newly to be covered or becoming uncovered is adjacent to the background area in the neighboring image used to predict the background area adjacent to the occlusion area in the current frame. Therefore, the location of the corresponding area can be determined indirectly by determining the location of either the foreground area or the background area adjacent to the occlusion area.
FIG. 2 illustrates an example of reference blocks used to form each block of an interpolated frame. Frame 210 is interpolated based on neighboring frames 200 and 220. Frames 200 and 220 are two consecutive frames captured by an imaging system. In this example, left objects correspond to background areas and right objects correspond to the foreground areas in frame 200 and frame 220. As shown in FIG. 2, blocks b0 to b7 are located in the background area and blocks b8 to b19 are located in the foreground area of frame 200. Blocks c0 to c5 are located in the background area and blocks c12 to c19 are located in the foreground area of frame 220. Blocks c6 to c11 in frame 220 are located in an area to be covered or becoming uncovered by the foreground area. If the motion estimation is performed from frame 200 to frame 220, blocks c6 to c11 are located in newly uncovered area of frame 220. Otherwise, blocks c6 to c11 are becoming covered if the motion estimation is performed from frame 220 to frame 200. The matched blocks in frame 200 and frame 220 can be found for the interpolation of the blocks in frame 210. The relationship between matched blocks in frame 200 and frame 220 can be determined by using motion estimation techniques. For example, motion vector MVB may be derived for the background area and MVF may be derived for the foreground area as shown by the bidirectional arrows in FIG. 2. Block-based motion estimation may be used and the technique is well known in the art. Therefore the details are not described here. It is noted that the length of each arrow has no relation with the length of any motion vector. In this example, blocks c0, c1, c2, c3, c4 and c5 in frame 220 are matched with blocks b2, b3, b4, b5, b6 and b7 in frame 200, respectively. Similarly, blocks c12, c13, c14, c15, c16, c17, c18 and c19 are matched with blocks b8, b9, b10, b11, b13, b14, and b15, respectively. For the area to be covered or becoming uncovered in frame 220, no block in frame 200 can be matched with blocks c6, c7, c8, c9, c10 and c11.
Frame 210 is interpolated based on frames 200 and 220. Each pair of matched blocks in frame 200 and frame 220 is used as two reference blocks to form one interpolated blocks of frame 210. Blocks in the foreground or the background of the interpolated frame can be formed based on the corresponding two reference blocks in these two neighboring frames. However, occlusion blocks a7, a8 and a9 in frame 210 can only be predicted based on the corresponding area to be covered or becoming uncovered in frame 220 as no matched blocks can be found in frame 200. The occlusion blocks a7, a8 and a9 should be predicted based on blocks c6, c7 and c8, respectively. The relation between occlusion blocks a7, a8 and a9 and the corresponding reference blocks c6, c7 and c8 can be given by motion vector MVOB which is shown by the dashed arrows in FIG. 2. MVOB is same as the motion vector giving the relationship between background areas of frame 210 and frame 220, such as the motion vector representing the relationship from a6 to c5. As shown in FIG. 2, the reference area in frame 220 is adjacent to the background area used to predict the neighboring background area of the occlusion area in frame 310.
In FIG. 2, the size of the motion vector MVB or MVF is just indicative of the matched blocks in frame 200 and frame 220. The motion vector MVOB is indicative of the occlusion blocks and the corresponding reference blocks in frame 210 and frame 220. As is well known in the art of motion estimation, the block matching is performed in two-dimensional space and the motion vector usually consists of horizontal component and vertical component. Each of the blocks corresponds to a two-dimensional pixel array. The one-dimensional drawing in FIG. 2 is used to simplify the illustration. Furthermore, while the blocks ai, bi and ci are drawn aligned vertically, it does not imply that the motion vector size is measured by the block size. For example, FIG. 2 may correspond to the match in the horizontal direction. The horizontal block may correspond to 8 pixels. While block bi is matched with block ci+4 in the foreground area as shown in FIG. 2, it does not necessarily imply the horizontal motion vector is 32 pixels (i.e., 4×8 pixels).
FIG. 3 illustrates another example of reference blocks used to form each block of an interpolated frame. Different from the example shown by FIG. 2, right objects correspond to background areas and left objects correspond to the foreground area in frame 300 and frame 320. Therefore, blocks b0 to b7 are located in the foreground area and blocks b8 to b19 are located in the background area of frame 300. Blocks c0 to c5 are located in the foreground area and blocks c12 to c19 are located in the background area of frame 320.
Frame 310 is interpolated based on the two captured frames 300 and 320. Occlusion blocks a7, a8 and a9 should be predicted based on blocks c9, c10 and c11, respectively. The relationship between occlusion blocks a7, a8 and a9 and blocks c9, c10 and c11 can be given by motion vector MVOB. In this example, MVOB is same as the motion vector giving the relationship between background areas of frame 310 and frame 320, such as the motion vector indicating the relationship from a10 to c12. The reference area used to predict the occlusion blocks in frame 310 corresponds to the area to be covered or becoming uncovered in frame 320. As shown in FIG. 3, the reference area in frame 320 is adjacent to the background area used to predict the neighboring background area of the occlusion area in frame 310. To calculate the occlusion blocks a7, a8 and a9, the location of the reference blocks (blocks c9, c10 and c11) should be determined.
If the location of the background area or the foreground area adjacent to the occlusion area is determined, the location of the area used as reference of the occlusion area can be determined indirectly. As shown in FIGS. 2 and 3, the area used as reference of the occlusion area is to be covered or becoming uncovered by the foreground area. The reference area of the occlusion area is adjacent to the reference area of the neighboring background area of the interpolated frame. The prediction can be performed by using the motion vector indicating the relationship from the neighboring background area in the current frame to the corresponding background area in the reference frame. Therefore, it is desirable to explore a method to determine the background area or the foreground area adjacent to the occlusion area for the prediction of the current image.