Surrogate loss functions are widely used in machine learning. In particular, for many machine learning problems, the ideal objective or loss function is computationally hard to optimize, and therefore one instead works with a (usually convex) surrogate loss that can be optimized efficiently. What are the fundamental design principles for such surrogate losses, and what are the associated statistical behaviors of the resulting algorithms?
This tutorial will some provide answers to some of these questions. In particular, we will review the theories of "proper" surrogate losses and "calibrated" surrogate losses, both of which yield statistically consistent learning algorithms for the true learning objective. We will also discuss recent results that draw connections between these two classes of surrogate losses via the framework of property elicitation, establishing that all calibrated surrogates can effectively be viewed as proper surrogates that estimate a suitable "sufficient" property of the data distribution. With these tools, the task of designing of a statistically consistent learning algorithm reduces to the task of identifying a suitable "sufficient" property for the target learning problem, and then designing a proper surrogate loss for estimating this property.