09h00 - 10h00
Using second-order information in training large-scale machine learning models
We will give a broad overview of the recent developments in using deterministic and stochastic second-order information to speed up optimization methods for problems arising in machine learning. Specifically, we will show how such methods tend to perform well in convex setting but often fail to provide improvement over simple methods, such as stochastic gradient descent, when applied to large-scale nonconvex deep learning models. We will discuss the difficulties faced by quasi-Newton methods that rely on stochastic first order information and Hessian-Free methods that use stochastic second order information.