Some embodiments of the presently disclosed subject matter relate to machine learning techniques, and more particularly to deep neural networks (DNN) such as deep convolutional neural networks (CNN). In particular, Some embodiments relate to a method of classification of unique/rare cases by reinforcement learning in neural networks. Such rare cases could be defined as a situation or scenario that could be significantly different from all previously learned data and have minor chances to occur in everyday situations.
The method of some embodiments are useful especially in the field of human-assisted or autonomous vehicles using a camera or a depth sensor, such as a Lidar sensor, for detection and avoidance of obstacles to navigate safely through environments.
A related art publication, “Distilling the Knowledge in a Neural Network”, Geoffrey Hinton, Oriol Vinyals, Jeff Dean, arXiv:1503.02531, proposes a way of compressing the knowledge of an ensemble of models (lower level of false alarms due to the need of agreement between all models) into a single model using special compression technique. The process of transferring knowledge from an ensemble of models to a single model is called “distillation”. It can be done by using class probabilities produced by the ensemble of models as “soft targets” for training a small model. Training can be done by using entirely unlabeled data or by using original training set of data. Although, knowledge could be transferred using completely unlabeled data, this method requires keeping the same number of classes in the target network as in the initial ensemble of networks. Therefore, even the knowledge on how to avoid selected false alarms due to common agreement acquired by an ensemble of networks to be transferred to a single network, it would be bounded within a local neighborhood of feature/decision space of those networks trained on labelled data samples. Thus, it will not be able to cope with previously unseen data samples efficiently at the classification stage.
Another related art publication, “Generative Adversarial Networks”, Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, arXiv:1406.2661, discloses the use of an adversarial process of simultaneous training of both generative and discriminative models. Generative models should model data distribution from previously labelled data samples in order to be able to synthesize new data samples. The discriminative model must or should be trained to estimate the probability of this new sample being from the training data set rather than generated by the model. This repetitive process corresponds to a minimax two-player game, where reinforcement learning leads to an improvement of the discriminative model. The major drawback of this system is twofold. First, two models are operating in the same feature space bounded by the labelled data samples, which leads to very limited improvement of classification of samples that are substantially different compared to those presented in the labelled data set. Even the generative model is producing novel samples; they can be interpreted as a linear combination of features learned from labelled samples, so it cannot model genuinely novel data. Second, while at initial stages, this method could locally improve decision space, it will not be able to efficiently explore its unknown areas due to the weak feedback from discriminative model's evaluation, which should lead to fast saturation of reinforcement process.
Another related art publication, “Explaining and Harnessing Adversarial Examples”, Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, arXiv:1412.6572, discloses Deep Neural Networks that can be easily fooled by adding random noise to specific image locations, which is a potential source of false alarms. It is proposed to employ a second-order method to search for minimal distortion of initial images samples that leads to false classification by generating slight deviations on it—modified images are called adversarial examples. They could be generated by back-propagating the loss from the classifier. It requires two passes of the network for every image sample. This method could be considered as another form of data augmentation. However since such method is only limited to data augmentation from existing labelled data samples, it cannot solve problem of misclassification of objects which have different visual appearance as for example rare/unseen/unique objects.
Another related art publication, “The Virtues of Peer Pressure: A Simple Method for Discovering High-Value Mistakes”, Shumeet Baluja Michele Covell, Rahul Sukthankar, CAIP 2015, proposes a method to efficiently and intuitively input instances that are misclassified by neural networks. It is proposed to train a basket of N similarly “peer networks” which should provide consistency pressure on each other. When an example is found for which a single network disagrees with all other networks, which are consistent in their prediction, that example is considered as a potential mistake. It is also proposed to generate mistakes by modifying original data using data augmentation transformations, such as translation, rescale, color change, etc. All the potential mistakes are later added to the training data set in order to improve classification accuracy. Further version of this technique supposes that instead of geometrical transformations done on original images, peer networks could be applied to classify image time series of an object in video, and for those frames, where networks disagree the most, could be considered as mistakes used to augment the training data set. In this way, video provides much richer source of data. Although degree of variability of data augmentation could be extended by using object tracking, it would still be limited by collecting data in a local space, which originated by training on labelled data samples. Since peer networks learned in a similar manner on similar labelled data, they will not be able to consistently classify previously unseen/rare/unique data. We should expect their confident agreement on visually similar data samples to previously learned data, and disagreement or non-confident agreement for new data samples. Therefore, it could not be expected consistent improvement of feature/decision space for those new data samples, which is a key action required to improve classification accuracy.
The related art publications propose solutions to the problem of misclassification of rare/unseen/unique data via various kinds of data augmentation, where new data samples (misclassified by current deep neural networks) are generated by modifying existing labelled data samples or using various time instances of unlabeled data samples (but visually similar to previously learned data samples) via tracking in video. Although the problem of interest can be largely solved with a deep neural network avoiding misclassification of previously unseen but visually similar data samples to those available during initial training, it cannot consistently improve accuracy for visually non-similar data samples, which were not used during training. One solution could be based on using extremely large datasets of labelled data to minimize number of visually non-similar unseen data samples, but this would be an exponentially expensive data collection task hardly possible in practice.