Machine Learning
Two of the most successful general approaches to transforming signals, such as speech, image and video signals, are model-based methods and neural networks. Each offers important advantages and disadvantages.
Model-Based Methods
A main advantage of the model-based methods, such as probabilistic graphical models, is that models can incorporate prior knowledge and intuition to reason at the problem level in devising inference procedures. Important assumptions about problem constraints can often be incorporated into the model-based approach. Examples include constraints such as the linear additivity of audio signals, e.g. speech signals, and visual occlusion in image processing, as well as more subtle statistical assumptions such as conditional independence, latent variable structure, sparsity, low-rank covariances, and so on. By hypothesizing and testing different problem-level constraints, insight into the nature of the problem can be gained and used to improve the modeling assumptions.
Unfortunately, inference in probabilistic models can be computationally intractable. Approximate methods, such as loopy belief propagation (BP) and variational approximations can derive iterative procedures to infer the latent variables of interest. However, despite greatly improving the situation, such iterative methods are often still too slow for time-sensitive applications, such as real time speech or video processing. In such cases, rigorous discriminative optimization of the models can be challenging because they may involve bi-level optimization, where optimization of the parameters of the model depends on an iterative inference procedure.
FIG. 2A shows a prior art model-based method. An inference procedure f 200 iterates 202 K times on input signals xi 201 using parameters 203 to infer intermediate variables ϕi. Then, an estimation model g 204 is applied to obtain output yi 205.
Neural Networks
Neural networks are formulated such that the inference is defined as a finite closed-form expression, organized into layers, which are typically executed in sequence. Typically, a neural network includes an input layer, one or more hidden layers, and an output layer. If the number of hidden layers is large, then the neural network is called a deep neural network, and the layers are learned incrementally. Discriminative training of the networks can be used to optimize speed versus accuracy trade-offs.
One well-known disadvantage is that conventional neural networks are closer to mechanisms than problem-level formulations, and can be considered essentially “black-box” methods. Therefore, it is very difficult to incorporate prior knowledge about the real world signals and the goal of the transformation into the network. Moreover, even with a working neural networks, it is often not clear how it actually achieves its results. Therefore, discovering how to modify the network to achieve better results is not straightforward. Another example of this disadvantage is that only a limited set of activation functions that perform the computation of each layer have been investigated, and it is not clear how to choose the best activation function to solve a particular problem, or how to design a new activation function that is best suited to solve a particular problem.