Sanjay Shakkottai, UT Austin
Title: On the Throughput vs Accuracy Trade-Off for Streaming Unsupervised Classification
We study an online and streaming unsupervised classification system. Our setting consists of a collection of classifiers (with unknown confusion matrices) each of which can classify one sample per unit time, and which are accessed by a stream of unlabeled samples. Each sample is dispatched to one or more classifiers, and depending on the labels collected from these classifiers, may be sent to other classifiers to collect additional labels. The labels are continually aggregated. Once the aggregated label has high enough accuracy (a pre-specified threshold for accuracy) or the sample is sent to all the classifiers, the now labeled sample is ejected from the system. For any given pre-specified threshold for accuracy, the objective is to sustain the maximum possible rate of arrival of new samples, such that the number of samples in memory does not grow unbounded.
In this talk, we characterize the Pareto-optimal region of accuracy and arrival rate, and discuss an algorithm that can operate at any point within this region. Our algorithm uses queueing-based routing and scheduling approaches combined with online tensor decomposition method to learn the hidden parameters, and provides Pareto-optimality guarantees. We finally verify our theoretical results through simulations on two ensembles formed using AlexNet, VGG, and ResNet deep image classifiers. Based on joint work with Soumya Basu, Steven Gutstein and Brent Lance.