Recent Papers that I enjoy reading
Theory papers:
- Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics
- Hidden progress in deep learning: Sgd learns parities near the computational limit
- Repeat After Me: Transformers are Better than State Space Models at Copying
Empirical papers:
- Physics of Large Language Models
- Small-scale proxies for large-scale Transformer training instabilities
- The Art of Scaling Reinforcement Learning Compute for LLMs
- Competition Dynamics Shape Algorithmic Phases of In-Context Learning