Note
Note
[Ongoing] Theory of scaling law
Understanding model’s performance when the model and data size scale up (i.e. Scaling law) is a critical topic in both academia and industry. Therefore, I believe that understanding the scaling law in a simplified theoretical setup will become very important.[Ongoing] Mechanistic Interpretability
In my opinion, mechanistic interpretability tools would be one of the promising routes to demystify the mechanisms of neural network and LLM.
I plan to understand deep learning phenomena using both theoretical and empirical toolsets above during my PhD studies, welcome discussions and contributions on the note above! Here is also an interesting repository.
Course Project
[Modern NLP Project at EPFL] Finetune chatbot for education use via Direct Preference Optimization
CS552 Modern NLP (Graduate-level) It is an interesting experience since it was my first time to implement something related to LLM. Also intriguingly, we discovered both chosen and reject reward decrease at the same time in section 5.2, which echos discussions on drawbacks of DPO.[OptML project at EPFL] Investigations into Sharpness-aware Minimization
CS439 Optimization in Machine Learning Project (Graduate-level) We discovered that many theory literature on SAM assumes diminishing radius, therefore, in this project, we explored how radius, norm of the ball and the normalization affect generalization performance.[ML project at HKUST] Theory for Understanding Transformers: An Overview of current research
COMP5212 Machine Learning Project (Graduate-level)
A brief survey on recent Transformer theory