The calculation of the learning parameters (hyperparameters) in deep neural networks gives rise to very complex and difficult nonlinear optimization problems. These optimization problems are non-convex and possess a large number of saddle points and local minima. Currently, the most widely used optimization algorithms used in deep learning are first order methods and especially the Stochastic Gradient Descent (SGD) methods. However SGD is not able to take advantage of the curvature information and as a results they converge very slowly to first order critical points. This means that a local minimum may never be reached.
Recently a second order optimization method referred to as “Hessian Free Deep Learning” has been proposed that is able to solve the optimization problems arising in deep learning architectures efficiently. Hessian Free Deep Learning uses the Conjugate Gradient (CG) method to solve the Newton equations iteratively. In turn, this makes it possible to solve the large optimization problems arising in many different architectures of deep learning by appropriately utilizing the CG method.
A major limitation of the Hessian Free Deep Learning algorithm is that it cannot easily incorporate the information related to the negative curvature in the optimization algorithm. Negative curvature is crucial when developing algorithms with guarantees of convergence to critical points that satisfy second order optimality conditions. Negative curvature allows optimization algorithms to escape from saddle points and local maxima when a local minimum is shot. Note that SGD does not have a means to distinguish between saddle points and local minima/maxima as the first order optimality conditions are satisfied at those points.
The calculation of negative curvatures is not an easy task and is related to the calculation or estimation of the left-most eigenpairs (i.e., the eigenvalues and their corresponding eigenvectors) of the Hessian matrix. Estimations of eigenpairs can be calculated during the CG method or the Lanczos method. However the simultaneous estimation of eigenpairs and solution of a non-convex problem in a deep learning framework is not well explored yet.