Classification of incomplete data is an important problem in machine learning that has been previously tackled from both the biological and computational perspectives. The proposed solutions to this problem are closely tied to the literature on inference with incomplete data, generative models for classification problems and recurrent neural networks.
Recurrent Neural Networks (RNNs) are connectionist computational models that utilize distributed representation and nonlinear dynamics of its units. Information in RNNs is propagated and processed in time through the states of its hidden units, which make them appropriate tools for sequential information processing. There are two broad types of RNNs: stochastic energy based RNNs with symmetric connections, and deterministic ones with directed connections.
RNNs are known to be Turing complete computational models and universal approximators of dynamical systems. They are especially powerful tools in dealing with the long-range statistical relationships in a wide variety of applications ranging from natural language processing, to financial data analysis. Additionally, RNNs are shown to be very successful generative models for data completion tasks.
Despite their immense potential as universal computers, difficulties in training RNNs arise due to the inherent difficulty of learning long-term dependencies and convergence issues. However, recent advances suggest promising approaches in overcoming these issues, such as using better nonlinear optimizers or utilizing a reservoir of coupled oscillators. Nevertheless, RNNs remain to be computationally expensive in both the training as well as the test phases. The idea in the disclosed method of this patent is, to imitate recurrent processing in a network and exploit its power while avoiding the expensive energy minimization in training, or computationally heavy sampling in test. Generative models are used to randomly generate observable data, using the learnt probabilistic structure encoded in its hidden variables. In contrast to the discriminative models, generative models specify a joint probability distribution over the observed data and the corresponding class labels. For an example, Restricted Boltzmann Machines are generative RNN models. Mixture models are perhaps the most widely used generative tools and Expectation Maximization has become the standard technique for estimating the corresponding statistical parameters, i.e., the parameters of a mixture of subpopulations in the training data. Given the parameters of subpopulation distributions, new data can be generated through sampling methods.
Classification under incomplete data conditions is a well studied problem. Imputation is commonly used as a pre-processing tool, before the standard classification algorithms are applied. The Mixture of Factor Analyzer approach assumes multiple clusters in the data, estimates the statistical parameters of these clusters and uses them for filling-in the missing feature dimensions. Thus, in imputation stage, the missing feature values are filled-in with sampled values from a pre-computed distribution. Here, multiple imputation assumes that the data come from a mixture of distributions, and is capable of capturing variation in the data. Sampling from a mixture of factor analyzers and filling-in the data is effectively very similar to the feedback information insertion in a neural network from a higher layer of neurons onto a lower layer of neurons.
Previously, both feedforward and recurrent neural network methods were proposed for denoising of images, i.e. recovering original images from corrupted versions. Multi-layer perceptrons were trained using backpropagation for denoising tasks, as an alternative to energy based undirected recurrent neural networks Hopfield models). Recurrent neural networks were trained for denoising of images, by forming continuous attractors. A convolutional neural network was employed which takes an image as input and outputs the denoised image. The weights of the convolutional layers were learned through backpropagation of reconstruction error. Denoising was used as a means to design a better autoencoder recurrent neural network. Pseudo-likelihood and dependency network approaches solve data completion problem by learning conditional distributions that predict a data component using the rest of the components. These two approaches show similarities to the method disclosed in this patent, due to the maximum likelihood estimation approach of the missing data components, i.e. k-means clustering and imputation of cluster center. However, none of the prior art propose an iterative procedure that roots from a high level class decision at the backend of a neural network and propagates this information back into the network for choosing the optimum sample, in a maximum likelihood sense from a mixture of distributions. Patent discloses a method for imputing unknown data components in a data structure using statistical methods. Tensor factorization method is used in patent to perform multiple imputation in retail sales data. Neural network method is disclosed in patent to denoise images that are compressed and decompressed.