Advances in machine learning have enabled computing devices to solve complex problems in many fields. Machine learning is typically classified as supervised or unsupervised. In supervised machine learning, training data includes an indication of the ultimate desired result that a model is being trained to generate. For example, for supervised machine learning of a neural network that determines whether input data indicates a failure condition, the training data would be “labeled,” such as by including a column “Failure? (Yes/No).” Thus, use of supervised machine learning places certain conditions on training data.
In contrast, unsupervised machine learning can be performed using unlabeled data. A common benchmark used to compare machine learning algorithms is digit recognition on the MNIST database of handwritten digits. Each training sample in the MNIST database is an image of a handwritten digit (zero to nine). The training images are labeled, but the labels are not used in an unsupervised machine learning scenario. Instead, the unlabeled images are input into an unsupervised machine learning algorithm, such as an autoencoder. Assuming that the images each include P pixels, training the autoencoder “teaches” the autoencoder how to perform two tasks. First, the autoencoder learns how to encode an input image to a specified number of features, such as Q features, where Q is typically less than P. Second, the autoencoder learns how to decode a feature vector of Q features to generate a “reconstructed” image having P pixels. In a perfect reconstruction, decoding the Q features generated by encoding an input image results in a reconstructed image that is identical to that input image. Once training is completed, the autoencoder can encode an input image to generate a compressed representation and then decode the compressed representation to get back the original input image with hopefully minimal error rate.
Autoencoders can be useful to “memorize”′ data characteristics and to output replicas of input data. However, autoencoders are not generally suitable for generative problems where the output data needs to be sufficiently similar, but not too similar or identical, to input data. For example, consider a video game in which a player races a vehicle through various urban and rural terrain. When automatically generating trees to fill out a forest or when automatically generating pedestrians to populate sidewalks, the player would find it boring if every tree and every pedestrian looked the same. A variational autoencoder (VAE) is one way of solving such generative problems. In a VAE, randomness is introduced during training. The encoder of the VAE produces a mean and a variance (deterministically), which provides a probability distribution in a latent space. During training, that mean and variance is used to randomly sample from a Gaussian distribution to get an encoded vector, which is then (deterministically) decoded. During evaluation (i.e., after training is completed), the VAE is used to either encode data (in which case only mean produced by the encoder is used) or to decode a given vector. Thus, returning to the video game trees example described above, slightly different trees may be generated by randomly sampling different vectors (in the latent space), and providing those vectors to the decoder to decode those vectors into trees. Since the input to the decoder will be slightly different, the output will be slightly different as well.