Publications
Feature learning
Iteratively reweighted kernel machines efficiently learn sparse functions
L. Zhu, D. Davis, D. Drusvyatskiy, and M. Fazel. Preprinted.Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
N. Mallinar, D. Beaglehole, L. Zhu, A. Radhakrishnan, P. Pandit, and M. Belkin. ICML 2025 (Oral).Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
L. Zhu, C. Liu, A. Radhakrishnan, and M. Belkin. ICML 2024.Quadratic models for understanding catapult dynamics of neural networks
L. Zhu, C. Liu, A. Radhakrishnan, and M. Belkin. ICLR 2024.A note on Linear Bottleneck networks and their Transition to Multilinearity
L. Zhu, P. Pandit, and M. Belkin. Preprinted.
Kernel regime
Neural tangent kernel at initialization: linear width suffices
A. Banerjee, P. Cisneros-Velarde, L. Zhu, and M. Belkin. UAI 2023.Restricted Strong Convexity of Deep Learning Models with Smooth Activations
A. Banerjee, P. Cisneros-Velarde, L. Zhu, and M. Belkin. ICLR 2023.Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
L. Zhu, C. Liu, and M. Belkin. NeurIPS 2022.Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models
C. Liu, L. Zhu, and M. Belkin. ICLR 2021 (Spotlight).On the linearity of large non-linear models: when and why the tangent kernel is constant
C. Liu, L. Zhu, and M. Belkin. NeurIPS 2020 (Spotlight).Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
C. Liu, L. Zhu, and M. Belkin. Applied and Computational Harmonic Analysis (ACHA) 2022.
Random matrix theory
- Spectral norm bound for the product of random Fourier-Walsh matrices
L. Zhu, D. Davis, D. Drusvyatskiy, and M. Fazel. Preprinted.