Neural networks are frequently employed in the field of machine learning to approximate unknown functions that can depend on a large number of inputs. Typically, a neural network (or simply “network”) comprises a number of interconnected units that send messages to one another. Each network unit has associated weight values that can be tuned based on labeled training data, which is input data for which corresponding outputs (“labels”) are known. Thereby, the network may be adapted (or “trained”) to produce a suitable output when subsequently presented with unlabeled input data.
In some cases, training of a neural network based on labeled training data may result in a network that is so closely tuned to the training data that it does not produce meaningful results on unlabeled data that is different, but similar to, the labeled training data. This can be due to the network “overfitting” its tunable parameters to apparent characteristics of the training data that are, in actuality, sampling noise. Thereby, by overfitting to the training data the network may not have applicability to subsequent unlabeled data.