Recent Papers that I enjoy reading

Theory papers:

Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics
Hidden progress in deep learning: Sgd learns parities near the computational limit
Repeat After Me: Transformers are Better than State Space Models at Copying

Empirical papers:

Physics of Large Language Models
Small-scale proxies for large-scale Transformer training instabilities
The Art of Scaling Reinforcement Learning Compute for LLMs
Competition Dynamics Shape Algorithmic Phases of In-Context Learning