Recent Papers that I enjoy reading

Theory papers:

  • Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics
  • Hidden progress in deep learning: Sgd learns parities near the computational limit
  • Repeat After Me: Transformers are Better than State Space Models at Copying

Empirical papers:

  • Physics of Large Language Models
  • Small-scale proxies for large-scale Transformer training instabilities
  • The Art of Scaling Reinforcement Learning Compute for LLMs
  • Competition Dynamics Shape Algorithmic Phases of In-Context Learning