This specification relates to training machine learning models.
Machine learning models can be trained using a limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) process. That is, a machine learning model training system may train a machine learning model by performing multiple iterations of a gradient descent training procedure that uses an L-BFGS process to determine values of the model parameters by finding a minimum or a maximum of a cost function of parameters of the model.