Depth Separation in Learning via Representation Costs
Poster, 37th Annual Conference on Learning Theory (COLT), Edmonton, Canada
Poster, 37th Annual Conference on Learning Theory (COLT), Edmonton, Canada
Talk, Brigham Young University Applied Math Seminar, Provo, Utah
Poster, Midwest Machine Learning Symposium, Chicago, Illinois
Talk, University of Chicago Computational and Applied Mathematics Student Seminar, Chicago, Illinois
A fundamental question in the theory of neural networks is the role of depth. Empirically it is widely known that deeper networks tend to perform better than shallow ones. However, the reasoning behind this phenomenon is not well understood. In this talk I will discuss the role of depth in the simplified case where most of the layers have a linear activation. Specifically, the regularization associated with training a neural network with many linear layers followed by a single ReLu layer using weight decay is equivalent to a function-space penalty that encourages the network to select a low-rank function, i.e. one with small active subspace.