Photo editing apps are used to erase distracting scene elements, adjust object positions, recover image content in occluded image areas, etc. In general, these and many other image editing operations require automated hole-filling, which is also referred to as inpainting or image completion. Hole-filling techniques used by photo editing apps generally do not provide satisfactory results. Such techniques often provide hole-filling results that are inconsistent with the local texture outside of the hole and/or that do not fit well with respect to the global structure of|[DRI] the image. Traditional patch-based hole filling methods can recreate textures accurately, but the results often fail to match the global structure of the image. For example, a hole may be filled such that a tree trunk or other such shape in the hole that should be relatively straight like the other tree trunks in the image instead is wavy or has irregular side edges. As another example, a hole may be filled such that the edge of a tree trunk or other shape at the hole boundary inside the hole is not aligned with the edge of the tree trunk at the hole boundary.
Existing neural network-based techniques that are used for texture synthesis also do not address the global structure issue encountered in hole filling. Recently neural network-based approaches have been introduced for texture synthesis and image stylization. These approaches generally use a noise vector as input and train a generator network to generate different textures based on texture examples from other images. However, using such techniques to fill holes in an image does not produce hole content consistent with the global structure of the image. Thus, neither patch-based techniques nor the neural network-based techniques adequately address the issue of matching the image's global structure. The hole filling results of such techniques are thus often unrealistic.
On the other hand, global [structure-based][DR2] techniques used to fill holes are able to estimate the global structure for a hole well, but the predicted texture is often not consistent with the textures outside the hole. For example, encoder-decoder neural networks can be used to identify high level features that represent the global structure of an image and use those features to create content to fill a hole in the image in a way that is consistent with that global structure. However, the patterns and other details of other portions of the image are not included in the filled hole region. The hole filling results of such techniques are thus also often unrealistic.
In sum, existing hole filling solutions that use patch-based and neural network-based approaches do not account for the different problems posed by the local texture and global structure of images. None of the existing hole filling techniques consistently performs well with respect to having the content created to fill a hole match the global structure and local texture of the other portions of the image. Accordingly, the techniques often fail to provide realistic hole filling results.