Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning Paper • 2402.17457 • Published Feb 27
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners Paper • 2402.04553 • Published Feb 7
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling Paper • 2405.14578 • Published May 23
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates Paper • 2206.00832 • Published Jun 2, 2022