Note
Research Note
[Ongoing] Theory of scaling law
In the modern LLM era, scaling up model size and data is a central topic in industry. While it remains less understood, it should be interesting to explore how a theoretical setup can improve our understanding of scaling.[Ongoing] Science of Deep Learning
While many people have been focused on using theories to open the blackbox of deep learning, it faces great difficulty due to its complex and nonlinearity structure. Instead, an empirical driven approach could be taken to probe and understand the mechanisms of deep learning & LLM, which is the science of deep learning, to understand and explain intriguing deep learning phenomena.[Ongoing] LLM Reasoning
Recently, the development of LLM has stepped into the test-time scaling regime. While it may be hard for academia to scale up computation, it would be interesting to scientifically investsigate how techniques such as chain-of-thought can enhance reasoning in the language model, which can be also probed and investigated using mechanistic approaches.
SOP
I am glad to share my SOP for the PhD application. Honestly I learnt a lot from the PhD application process, and growing from a student into a junior researcher, by preparing application materials, looking for information of different research groups, and chatting with different senior researchers.
Course Project
[Modern NLP Project at EPFL] Finetune chatbot for education use via Direct Preference Optimization
CS552 Modern NLP (Graduate-level) It is an interesting experience since it was my first time to implement something related to LLM. Also intriguingly, we discovered both chosen and reject reward decrease at the same time in section 5.2, which echos discussions on drawbacks of DPO.[OptML project at EPFL] Investigations into Sharpness-aware Minimization
CS439 Optimization in Machine Learning Project (Graduate-level) We discovered that many theory literature on SAM assumes diminishing radius, therefore, in this project, we explored how radius, norm of the ball and the normalization affect generalization performance.[ML project at HKUST] Theory for Understanding Transformers: An Overview of current research
COMP5212 Machine Learning Project (Graduate-level)
A brief survey on recent Transformer theory