math - a u-brixton Collection

u-brixton 's Collections

math

foundation_models

alignment_24_best

monte_carlo_24_best

math

updated 9 days ago

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

Paper • 2402.17457 • Published Feb 27
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners

Paper • 2402.04553 • Published Feb 7
TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11 • 26
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

Paper • 2405.14578 • Published May 23
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates

Paper • 2206.00832 • Published Jun 2, 2022
Large Language Models as Markov Chains

Paper • 2410.02724 • Published Oct 3 • 31
Old Optimizer, New Norm: An Anthology

Paper • 2409.20325 • Published Sep 30 • 3
Scaling Law with Learning Rate Annealing

Paper • 2408.11029 • Published Aug 20 • 3