Image inpainting is the process of reconstructing a part of an image and has many applications in computer graphics and vision, such as photo restoration, object removal, content reshuffling, panorama stitching, and image style transfer. To illustrate, an image having a content hole can be reconstructed. Specifically, the image inpainting approximates the missing content and fills out the hole.
A number of existing approaches have been developed in the industry for the image inpainting. A first approach relies on a patch-based scheme. Under this scheme, missing content in a region (e.g., a hole) of the image is initialized and iteratively updated by matching it with known content in other regions of the image. The matching relies on patches between the missing and known regions. In this way, the missing content is patched from the known content.
More specifically, for each patch inside the region of missing content, a similar patch outside the region is found, where the similarity relates to current red, blue, green (RGB) estimates of the missing content and RGB values of the known content. The RGB estimates inside the region are updated based on the found similar patches and the search and update process is repeated until convergence.
The patch-based approach can provide high-resolution results for the missing content under certain scenarios. However, this approach tends to fail when the missing content is actually different from the known content and to over-blur the results. That is because, in the intermediate iterations, the RGB estimates inside the region are not accurate and consequently the predicted visual content it not yet sharp. Furthermore, the iterative process can be time-consuming and computationally burdensome given the number of iterations and the number patches and needed comparisons in each iteration.
To illustrate, consider an example of an image showing a house facade that includes a window and that is missing another window and a door. The “window hole” could be accurately patched by using content of the shown window. However, the “door hole” is less accurately patched because no door content is available from the image. In this case, the “door hole” may be inaccurately filled with the content of the shown window. Hence, the patch-based approach is typically limited to repeating known content elsewhere in the image and does not generate new content.
In another approach, deep learning is used. In particular, a neural network is trained to predict a missing region. The capability of the neural network to draw up content learned from its training data avoids the first limitation of the patch-based approach. However, the predicted results tend to be of low resolution and are limited to the specific training domains. For example, if the neural network is trained on faces, predicting the content of the “door hole” in the above example image is likely to be inaccurate. Hence, the deep learning approach is difficult to generalize beyond the specific training domain. Extending this approach to work on arbitrary images and to provide high resolutions results necessitates training over the entire space of plausible images and learning a huge number of parameters, which are not tractable.