Machine learning models are widely employed to process large amounts of input data to generate or extract information of interest therefrom, such as, for example, descriptive or predictive information. Example uses for machine learning models include, for example, data mining, pattern recognition, “spam” identification, audio transcription, and so on.
Generally, a machine learning model may be a supervised learning model or an unsupervised learning model. A supervised learning algorithm or model is an algorithm that is initially trained using a training or sample data set, in which each sample specifies one or more input values and one or more output values that are caused or produced by the input values. Such data samples are typically termed “labeled” data samples due to the explicit association of the output values with the input values of the samples. Once the supervised learning algorithm has been trained by processing the sample data set, operational data, in which the resulting output value for each of the one or more outputs is currently unknown, is then provided as input to the trained algorithm to generate the one or more output values for each operational data unit. Types of supervised learning models may include, but are not limited to, artificial neural networks (ANNs), Bayesian networks, and symbolic machine learning algorithms.
In unsupervised learning models, the training data is “unlabeled,” such that an explicit label or output value is not associated with any of the training data samples. Instead, all observed values of the training data samples may be presumed to be caused by a set of hidden or “latent” variables or values. However, both input and output variables or values may be provided to an unsupervised learning algorithm as observed values to determine a relationship between the inputs and outputs, even though the inputs are not considered in an unsupervised learning algorithm to cause or produce the outputs. Generally, in operation after the training phase, unsupervised learning models are employed to discover hidden structures or key features of operational data, or cluster together similar instances of operational data. Types of unsupervised learning models may include some ANNs, vector quantization algorithms, cluster analysis algorithms, and outlier detection algorithms.
Hybrid approaches, such as semi-supervised learning models or algorithms, may employ both labeled and unlabeled data for training purposes. In such models, a relatively large amount of unlabeled data and a relatively small amount of labeled data are often employed during the training phase.
More recently, both training data sets and operational data units for typical machine learning algorithms have greatly increased in size, causing the overall processing time for development and deployment of such models using these large data sets to increase significantly.