Applications for machine learning may be used in devices (e.g., a mobile device, a security device, and an autonomous vehicle) that provide object detection, classification and recognition in the area surrounding the device. Sequential modeling with long-term correlation has been a challenge to implement in machine learning. A typical approach, such as a hidden Markov model (HMM), requires an exponential increase in the number of states to improve long-term dependency requiring significant computing resources. Recent advances in implementations of deep neural networks have significantly improved the performance of language modeling, machine translation, machine vision and speech recognition. Among the various neural network models, a recurrent neural network (RNN) may capture long term dependency in sequential data with a simple recurrent mechanism. The architecture and computing resource complexity required to implement the RNN is significantly more efficient as compared with implementation of a traditional HMM.
However, proper training of an RNN network is extremely difficult due to vanishing/exploding gradients. The vanishing gradient problem is a difficulty found in training an RNN network with gradient-based learning methods and backpropagation. With a vanishing gradient, the error signal decreases exponentially and the early layers train very slowly. Instability may also occur with an exploding gradient in which the error signal increases exponentially. After many heuristic approaches have been applied to solve the RNN training problem, there have been a few successful deep neural network training architectures. Among them, a long short-term memory (LSTM) network is one of the most popular deep neural network architectures. The LSTM network provides learnable gate networks that may provide additional gradient paths in time. Therefore, depending on the state of the gate networks, some gradient paths may survive much longer than others, which may resolve the vanishing/exploding gradient problems.
Although the LSTM network provides promising results in many areas of machine learning and artificial intelligence, the LSTM network is based on a first-order recurrent network architecture. A first-order recurrent network architecture has limitations in modelling very long-term dependency of sequential data.