👋 Welcome to Wei Shen’s LLM Blog
Email: [email protected]
Authors: Wei Shen, Chuheng Zhang, Liang Zeng, Chonghan Liu, Yuliang Liu, Renbiao Liu
TL;DR: In-Context Learning (ICL) empowers Large Language Models (LLMs) with the ability to learn directly from a limited set of examples within their context, enabling them to generalize to new tasks without the need for explicit gradient updates. This capability, recognized as an emerging capacity of LLMs, has attracted researchers’ attention to uncover its underlying mechanisms. Additionally, the pre-training and alignment processes of models like ChatGPT and GPT-4 have attracted considerable attention. Numerous studies are investigating the performance of LLMs in these phases, particularly their applications as chatbots for end-users. This raises questions about the interplay between these paradigms and how ICL might enhance chat model performance.
Pre-SFT: Let Models Decide on Supervisory Data for Fine-Tuning
Authors: Wei Shen, Huifeng Sun, Yunhui Xia, Dai Dai
In this blog, we introduce an innovative Supervised Fine-Tuning (SFT) method called the Pre-SFT method, which integrates traditional SFT with Rejection Sampling Fine-Tuning (RFT) techniques. The Pre-SFT method begins by fine-tuning the model using the original SFT dataset. Then, we use the BLEU score between the original responses and the model-generated responses to evaluate how well the model has learned each prompt-response pair. For those prompt-response pairs that the model struggles to learn, we generate high-quality responses using the fine-tuned model and replace the original responses in the dataset with these model-generated responses. This approach enhances the model’s performance by leveraging its own generated responses, while also reducing the computational overhead typically associated with multiple sampling iterations in RFT.
System,Mathematics and Code in TRL PPO
Authors: Yunhui Xia, Wei Shen
TL;DR: TRL is a full stack library that provides a set of tools to train transformer language models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step. The library is integrated with 🤗 transformers. In this blog, we will introduce both system architecture, code and mathematics of PPO in TRL. Specifically, We split this blog into three parts: 1) An introduction to TRL PPO system architecture, 2) Mathematics in PPO algorithm, 3) code in PPO Trainer.