👋 Welcome to Wei Shen’s LLM Blog
Email: [email protected]
Authors: Wei Shen, Chuheng Zhang, Liang Zeng, Chonghan Liu, Yuliang Liu, Renbiao Liu
TL;DR: In-Context Learning (ICL) empowers Large Language Models (LLMs) with the ability to learn directly from a limited set of examples within their context, enabling them to generalize to new tasks without the need for explicit gradient updates. This capability, recognized as an emerging capacity of LLMs, has attracted researchers’ attention to uncover its underlying mechanisms. Additionally, the pre-training and alignment processes of models like ChatGPT and GPT-4 have attracted considerable attention. Numerous studies are investigating the performance of LLMs in these phases, particularly their applications as chatbots for end-users. This raises questions about the interplay between these paradigms and how ICL might enhance chat model performance.
Prompting Your Model to Answer Questions as Sharply as Deepseek-R1
Authors: Wei Shen
TL;DR: Deepseek-R1 is famous for its strong reasoning ability and sharply question-answer style. As shown in their paper, Deepseek-R1 obtains its reasoning ability from long chain of thought reinforcement learning with reasoning data. However, they have not uncovered how Deepseek-R1 answers questions so sharply. In this blog, we introduce a simple method inspired by Constitutional AI, which prompts a normal model (Doubao in this blog) to answer questions as sharply as Deepseek-R1. With further fine-tuning with these data, the model can easily obtain this ability.
Pre-SFT: Let Models Decide on Supervisory Data for Fine-Tuning
Authors: Wei Shen, Huifeng Sun, Yunhui Xia, Dai Dai
TL;DR: In this blog, we introduce an innovative Supervised Fine-Tuning (SFT) method called the Pre-SFT method, which integrates traditional SFT with Rejection Sampling Fine-Tuning (RFT) techniques. The Pre-SFT method begins by fine-tuning the model using the original SFT dataset. We then use the BLEU score to evaluate how well the model has learned each prompt-response pair by comparing the original responses with the model-generated responses. For prompt-response pairs that the model struggles to learn, we generate high-quality responses using the fine-tuned model and replace the original responses in the dataset with these model-generated ones. This approach enhances the model’s performance by leveraging its own generated responses while also reducing the computational overhead typically associated with multiple sampling in RFT.
Preference Modeling: Binary Discrimination Versus Imitation Learning
Authors: Wei Shen, Yunhui Xia