Machine learning defines models that can be used to predict occurrence of an event, for example, from sensor data or signal data, or recognize/classify an object, for example, in an image, in text, in a web page, in voice data, in sensor data, etc. Machine learning algorithms can be classified into three categories: unsupervised learning, supervised learning, and semi-supervised learning. Unsupervised learning does not require that a target (dependent) variable y be labeled in training data to indicate occurrence or non-occurrence of the event or to recognize/classify the object. An unsupervised learning system predicts the label, target variable y, in training data by defining a model that describes the hidden structure in the training data. Supervised learning requires that the target (dependent) variable y be labeled in training data so that a model can be built to predict the label of new unlabeled data. A supervised learning system discards observations in the training data that are not labeled. While supervised learning algorithms are typically better predictors/classifiers, labeling training data often requires a physical experiment or a statistical trial, and human labor is usually required. As a result, it may be very complex and expensive to fully label an entire training dataset. A semi-supervised learning system only requires that the target (dependent) variable y be labeled in a small portion of the training data and uses the unlabeled training data in the training dataset to define the prediction/classification (data labeling) model.
Prior information is usually considered as an important information resource in machine learning and has been widely utilized for enhancing the prediction performance in machine learning such as Bayesian statistics, non-parametric Bayesian models, etc. For example, in medical image diagnosis, prior information about the distribution of a disease and a survival rate may help doctors make better decisions using a machine learning model. As another example, in sentiment analysis, prior polarity scores may be used to improve a classification performance. In the natural language processing, prior information plays an important role in the generative statistical model and Bayesian inference such as latent Dirichlet allocation.