Research and development has been conducted for techniques to generate, from an input video sequence including multiple objects, a video sequence or a still image by removing at least one of the objects. The techniques are basic ones to be commonly used for a wide range of applications, such as generating background images in image processing for monitoring cameras, generating sprite images in video coding, and completion of occluded region in generating multi-eye stereoscopic video images, as well as editing video sequences.
In editing a video sequence, described below is a typical procedure for generating a video sequence with an object removed. First, a user specifies an object to be removed from among multiple objects included in an input video sequence. Then, image processing is executed to complete an image (pixel value) of a region (occluded region) of another object occluded by the specified object.
Space-time completion is one of such techniques to complete an image in an occluded region. The space-time completion involves completing an image in an occluded region, based on an assumption that the image in the occluded region is found in a picture at a different time in the input video sequence. Specifically, the space-time completion involves searching a picture at a different time in the input video sequence for a region matching with the occluded region, and replicating the searched image in the similar region onto the occluded region (see Non-Patent Literature 1, or NPL 1 for example).
First, the technique in NPL 1 involves setting a space-time window to enclose an occluded region (region to be removed) in a temporal image included in a video sequence. Then, the technique involves searching multiple pictures included in the input video sequence for the matching region having the best matching color and motion with the color and motion in the space-time window. Then, an image in the searched matching region is replicated onto the occluded region. Hence, the technique in NPL 1 makes it possible to appropriately complete the image in the occluded region as far as the matching region is found in a different picture in the video sequence even though the occluded region is one for a dynamic object.