In machine learning, computers are programmed to perform mathematical algorithms that can identify potentially interesting features of data, such as meaningful patterns. The machine learning algorithms create models that can be used to learn feature representations from large data sets. Once feature representations are learned, the trained model can be used to classify new instances of data.
Shallow models, such as Gaussian Mixture Models, Dynamic Bayesian Networks, Conditional Random Fields, Maximum Entropy models, and Support Vector Machines, have been used for event detection in speech. Hierarchical deep networks, linear deep networks such as And-Or Graphs, and non-linear deep networks such as Boltzmann Machines and Neural Networks, have been used in vision, speech, and natural language processing.
Deep learning refers to a machine learning approach for learning representations of data that uses a model architecture having multiple non-linear transformations. A “representation” may refer to a mathematical construct used to identify or communicate something about a piece of data (e.g., a “feature” of the data) in a more structured way. For example, in computer vision, the visual content of a digital image can be represented at a “low level” by a vector of intensity values per pixel, or at a higher level (e.g., in a more abstract way) as a set of edges or regions of interest.
Deep learning architectures can include generative, discriminative, or hybrid models. Hybrid models include both a generative component and a discriminative component. Different approaches to developing hybrid models include joint methods, iterative methods, and staged methods. Joint methods can optimize a single objective function, which consists of both generative and discriminative energies. Iterative methods train the generative and discriminative models in an iterative manner, so that each model influences the other. In staged methods, the generative and discriminative models are trained separately, with the discriminative model being trained on feature representations learned by the generative model. Classification is performed after the training samples are projected into a fixed-dimensional space induced by the generative model.
Restricted Boltzmann Machines (RBMs) can form the building blocks of deep networks models. Deep networks can be trained using the Contrastive Divergence (CD) algorithm. RBMs can be stacked together to form deeper networks known as Deep Boltzmann Machines (DBMs), which capture more complex feature representations. Deep networks-based temporal models include Conditional RBMs (CRBMs) and Temporal RBMs (TRBMs). Conditional Random Fields (CRFs) can be used to label sequential data. CRFs can utilize arbitrary features and model non-stationarities. Hidden Conditional Random Fields (HCRFs) are an extension of CRFs that include hidden states.