Publications
Feature learning
Iteratively reweighted kernel machines efficiently learn sparse functions
Zhu L., Davis D., Drusvyatskiy D. & Fazel M. Pre‑printed.Emergence in non‑neural models: grokking modular arithmetic via average gradient outer product
Mallinar N., Beaglehole D., Zhu L., Radhakrishnan A., Pandit P. & Belkin M. ICML 2025.Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Zhu L., Liu C., Radhakrishnan A. & Belkin M. ICML 2024.Quadratic models for understanding catapult dynamics of neural networks
Zhu L., Liu C., Radhakrishnan A. & Belkin M. ICLR 2024.A note on Linear Bottleneck networks and their Transition to Multilinearity
Zhu L., Pandit P. & Belkin M. Pre‑printed.
Kernel regime
Neural tangent kernel at initialization: linear width suffices
Banerjee A., Cisneros‑Velarde P., Zhu L. & Belkin M. UAI 2023.Restricted Strong Convexity of Deep Learning Models with Smooth Activations
Banerjee A., Cisneros‑Velarde P., Zhu L. & Belkin M. ICLR 2023.Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
Zhu L., Liu C. & Belkin M. NeurIPS 2022.Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models
Liu C., Zhu L. & Belkin M. ICLR 2021 (Spotlight).On the linearity of large non‑linear models: when and why the tangent kernel is constant
Liu C., Zhu L. & Belkin M. NeurIPS 2020 (Spotlight).Loss landscapes and optimization in over‑parameterized non‑linear systems and neural networks
Liu C., Zhu L. & Belkin M. Applied and Computational Harmonic Analysis (ACHA) 2022.
Random matrix theory
- Spectral norm bound for the product of random Fourier-Walsh matrices
Zhu L., Davis D., Drusvyatskiy D. & Fazel M. Pre‑printed.