Foundational Mechanics of Large Language Models

In-Context Learning

Exploring the Potential of In-Context Learning: New Pathways for Enhancing Chat-Based Large Language Model Performance

Authors: Wei Shen, Chuheng Zhang, Liang Zeng, Chonghan Liu, Yuliang Liu, Renbiao Liu

TL;DR: In-Context Learning (ICL) empowers Large Language Models (LLMs) with the ability to learn directly from a limited set of examples within their context, enabling them to generalize to new tasks without the need for explicit gradient updates. This capability, recognized as an emerging capacity of LLMs, has attracted researchers’ attention to uncover its underlying mechanisms. Additionally, the pre-training and alignment processes of models like ChatGPT and GPT-4 have attracted considerable attention. Numerous studies are investigating the performance of LLMs in these phases, particularly their applications as chatbots for end-users. This raises questions about the interplay between these paradigms and how ICL might enhance chat model performance.

Alignment Strategy

Supervised Fine-tuning & Context Distillation & Prompting

Prompting Your Model to Answer Questions as Sharply as Deepseek-R1

Authors: Wei Shen

TL;DR: Deepseek-R1 is famous for its strong reasoning ability and sharply question-answer style. As shown in their paper, Deepseek-R1 obtains its reasoning ability from long chain of thought reinforcement learning with reasoning data. However, they have not uncovered how Deepseek-R1 answers questions so sharply. In this blog, we introduce a simple method inspired by Constitutional AI, which prompts a normal model (Doubao in this blog) to answer questions as sharply as Deepseek-R1. With further fine-tuning with these data, the model can easily obtain this ability.

Pre-SFT: Let Models Decide on Supervisory Data for Fine-Tuning

Authors: Wei Shen, Huifeng Sun, Yunhui Xia, Dai Dai

TL;DR: In this blog, we introduce an innovative Supervised Fine-Tuning (SFT) method called the Pre-SFT method, which integrates traditional SFT with Rejection Sampling Fine-Tuning (RFT) techniques. The Pre-SFT method begins by fine-tuning the model using the original SFT dataset. We then use the BLEU score to evaluate how well the model has learned each prompt-response pair by comparing the original responses with the model-generated responses. For prompt-response pairs that the model struggles to learn, we generate high-quality responses using the fine-tuned model and replace the original responses in the dataset with these model-generated ones. This approach enhances the model’s performance by leveraging its own generated responses while also reducing the computational overhead typically associated with multiple sampling in RFT.

Reward Model

Preference Modeling: Binary Discrimination Versus Imitation Learning

Authors: Wei Shen, Yunhui Xia