Publications
Feature learning
- N Mallinar, D Beaglehole, L Zhu, A Radhakrishnan, P Pandit, M Belkin, Emergence in non-neural models: grokking modular arithmetic via average gradient outer product. Pre-printed.
- L Zhu, C Liu, A Radhakrishnan, M Belkin, Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning. ICML 2024.
- L Zhu, C Liu, A Radhakrishnan, M Belkin, Quadratic models for understanding catapult dynamics of neural networks. ICLR 2024.
- L Zhu, P Pandit, M Belkin, A note on Linear Bottleneck networks and their Transition to Multilinearity. Pre-printed.
Kernel regime
- A Banerjee, P Cisneros-Velarde, L Zhu, M Belkin, Neural tangent kernel at initialization: linear width suffices. UAI 2023.
- A Banerjee, P Cisneros-Velarde, L Zhu, M Belkin, Restricted Strong Convexity of Deep Learning Models with Smooth Activations. ICLR 2023.
- L Zhu, C Liu, M Belkin, Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture. NeurIPS 2022.
- C Liu, L Zhu, M Belkin, Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models. ICLR 2021 (Spotlight).
- C Liu, L Zhu, M Belkin, On the linearity of large non-linear models: when and why the tangent kernel is constant. NeurIPS 2020 (Spotlight).
- C Liu, L Zhu, M Belkin, Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis (ACHA) 2022.