Alignment Guidebook

Shangmin Guo@University of Edinburgh | [email protected]

Wei Xiong@University of Illinois Urbana-Champaign | [email protected]

Alphabetical order

Thank Hanze Dong@Salesforce, Tianqi Liu@Google, Wei Shen@Ernie code team, Haoxiang Wang@UIUC, for insightful feedback on an early draft of this blog.

Date: Mar 26, 2024

To readers:

Leave comment or send email if you feel any part of this article can be improved!

TL; DR:

Reinforcement learning from human feedback (RLHF) is a leading technique to adapt the outputs of generative models to be preferred by human and has achieved tremendous success in ChatGPT by OpenAI, Claude by Anthropic, and Gemini by Google. Inspired by these successes, preference optimization (a slightly more general terminology that also contains the RL-free algorithms) has attracted significant attentions in the past year. In this blog, we aim to present a comprehensive introduction to the frontier research in this exciting field, explore the on-going challenges, and discuss the interesting research problems for the future.

Table of Content