Field of the Invention
This invention relates to deep neural networks, and in particular, it relates to a data augmentation method for generating labeled training data for deep neural networks using style transfer.
Description of Related Art
Artificial neural networks are used in various fields such as machine leaning, and can perform a wide range of tasks such as computer vision, speech recognition, etc. An artificial neural network is formed of interconnected layers of nodes (neurons), where each neuron has an activation function which converts the weighted input from other neurons connected with it into its output (activation). In a learning (or training) process, training data are fed into to the artificial neural network and the adaptive weights of the interconnections are updated through the learning process. After a neural network is trained, data to be processed is inputted to the network to generate processing results. Training data is formed of data of the same nature as the data to be processed (e.g. images) along with labels that indicate what the processing result should be for each input data. For example, if a neural network is being trained to recognize cats and dogs in images, the training data would include images contain cats or dogs along with a label for each training image indicating whether it contains a cat or a dog.
Training a deep neural network (DNN) requires a large amount of labeled training data to avoid model overfitting and to improve model generalizability. To fully utilize existing labeled training data, it is a common practice to augment training data (aka data augmentation) by label preserving transformations on the original training dataset to generate additional training data. For example, for training a DNN for the task of image recognition, an existing training image can be cropped for the labeled object, geometrically transformed (translation, rotation, scaling, shearing, lens-distortion, etc.), transformed in color or intensity, and/or applied with various types of noise to generate a new training image with the same label. Data augmentation enlarges the pool of training data without the need to provide additional training labels. The additional training data produced by such methods is “general” in the sense that they are not transformed into a particular “style”.
Depending on the capacity of the DNN model, the number of training data needed can be of magnitude of millions to avoid over-fitting. In practice, to train a DNN for a particular task, e.g., to recognize particular objects, the DNN may be first pre-trained on a very large training dataset (referred to as a general dataset, such as ImageNet, which is a publically available image database in which the images have been labeled), and then trained (fine-tuned) on an application-specific training dataset, which is typically much smaller (referred to as a custom dataset). This approach is called transfer learning. Usually the custom dataset needs additional labeling. Depending on the similarity between the custom dataset and general dataset, the size of custom dataset can be in the thousands. Manually labeling the custom dataset is costly, tedious, and often error-prone.
It has been proposed to use synthetic images rendered from 3D CAD models for data augmentation. One challenge for this approach is to generate photorealistic images. For example, X. Peng et al., “Synthetic to Real Adaptation with Generative Correlation Alignment Networks,” arXiv preprint arXiv:1701.05524v3, 18 Mar. 2017, describes a Deep Generative Correlation Alignment Network (DGCAN) for synthesizing images using a domain adaption algorithm.
C. Charalambous et al., “A data augmentation methodology for training machine/deep learning gait recognition algorithms,” arXiv preprint arXiv:1610.07570v1, 24 Oct. 2016, describes “a simulation-based methodology and a subject-specific dataset which can be used for generating synthetic video frames and sequences for data augmentation” for gait recognition (Abstract).
U.S. Pat. Appl. Pub. No. 2015/0379422, entitled “Data augmentation based on occlusion and inpainting,” describes “Augmenting a dataset in a machine learning classifier . . . One example is a system including a training dataset with at least one training data, and a label preserving transformation including an occluder, and an inpainter. The occluder occludes a selected portion of the at least one training data. The inpainter inpaints the occluded portion of the at least one training data, where the inpainting is based on data from a portion different from the occluded portion.” (Abstract.)
Style transfer is a type of image transformation that transforms an input image into an output image that has the semantic content of the input image but the “style” of a reference image. For example, L. A. Gatys et al., “A Neural Algorithm of Artistic Style,” arXiv preprint arXiv:1508.06576v2, 2 Sep. 2015 (“Gatys et al. 2015”), describes a deep neural network model that can transform an arbitrary image into an image having a particular artistic style, for example, the style of a van Gogh painting. “The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images.” (Abstract.)
J. Johnson et al., “Perceptual Loss for Real-Time Style Transfer and Super-Resolution,” arXiv preprint arXiv:1603.08155v1, 27 Mar. 2016 (“Johnson et al. 2016”), describes a style transfer method that can generate artistic images similar to the results of Gatys et al. 2015 but is said to be three orders of magnitude faster. “We train feed-forward transformation networks for image transformation tasks, but rather than using per-pixel loss functions depending only on low-level pixel information, we train our networks using perceptual loss functions that depend on high-level features from a pretrained loss network. During training, perceptual losses measure image similarities more robustly than per-pixel losses, and at test-time the transformation networks run in real-time.” (Page 2.)
F. Luan et al., “Deep Photo Style Transfer,” arXiv preprint arXiv:1703.07511v3, 11 Apr. 2017 (“Luan et al. 2017”), describes a deep-learning approach to photographic style transfer that can faithfully transfer the reference style in a broad variety of scenarios, including transfer of the time of day, weather, season, and artistic edits. The approach can suppress distortion and yield satisfying photorealistic style transfer.
U.S. Pat. No. 9,576,351, entitled “Style transfer or headshot portraits,” describes a method for transferring the style for headshot portraits. U.S. Pat. No. 9,594,977, entitled “Automatically selecting example stylized images for image stylization operations based on semantic content,” describes a method for “content-based selection of style examples used in image stylization operations. For example, training images can be used to identify example stylized images that will generate high-quality stylized images when stylizing input images having certain types of semantic content.” (Abstract.)