Title
Stochastic Gradient Descent: Marrying Theory with Practice
Speaker
Prateek Jain, MSR India
Abstract

Stochastic Gradient Descent (SGD) is the workhorse of most modern ML based solutions. The method was discovered first in 1951 (Robbins-Monro algorithm) and has since generated tremendous interest and impact especially for training deep neural networks. However, there is a significant gap between the practical versions of SGD and the stylized versions used for theoretical understanding. In this tutorial, we will highlight some of the gaps and present a few latest results that attempt at bridging it. In particular, we will talk how we can rigorously understand the following practical variants of standard SGD: a) SGD + mini-batching, b) SGD + acceleration, c) SGD with random reshuffling,  d) SGD with last point iterate.

The tutorial is based on joint works with Praneeth Netrapalli, Sham Kakade, Rahul Kidambi, Dheeraj Nagaraj, Aaron Sidford.