GPT-2 Text Generation with Reinforcement Learning from Human Feedback (RLHF) Implementation This title encapsulates the main components and purpose of the code: It uses GPT-2, a language model for text generation. It implements Reinforcement Learning from Human Feedback (RLHF). The goal is to improve text generation based on simulated human feedback. The code demonstrates a complete pipeline that includes: Loading and using the GPT-2 model Generating baseline text outputs Simulating human feedback Implementing a custom environment for reinforcement learning Training the model using Proximal Policy Optimization (PPO) Evaluating the model's performance before and after training
davidangularme/GPT2-RLHF
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|