Pair image generation refers to generating a pair of corresponding images in two different modalities such as a face with different attributes, a character in different fonts, or a color image and the corresponding depth image. The multimodal image generation refers to generating a pair or more of corresponding images of different modalities. The generation of multimodal images has a wide range of applications. For example, the multimodal images can be applied to render novel pairs of corresponding images for movies and computer games. For example, a method described in U.S. Pat. No. 7,876,320 synthesizes two or more face images, or at least one face image and one face graphics or a face animation to thereby create a fictional face image.
A number of methods use one-to-one correspondences between images in different modalities to generate a multimodal digital image. Examples of those methods include a deep multi-modal Boltzmann method and a coupled dictionary learning method. Some methods can use physical models to generate corresponding images in the two different modalities such as image super-resolution or image deblurring. However, in general case, determining the one-to-one correspondences between images in different modalities is challenging.
Accordingly, there is a need to generate a multimodal digital image without relying on one-to-one correspondence between different modalities in the training data.