Skip to content

davidangularme/GPT2-RLHF

Repository files navigation

GPT2-RLHF

GPT-2 Text Generation with Reinforcement Learning from Human Feedback (RLHF) Implementation This title encapsulates the main components and purpose of the code: It uses GPT-2, a language model for text generation. It implements Reinforcement Learning from Human Feedback (RLHF). The goal is to improve text generation based on simulated human feedback. The code demonstrates a complete pipeline that includes: Loading and using the GPT-2 model Generating baseline text outputs Simulating human feedback Implementing a custom environment for reinforcement learning Training the model using Proximal Policy Optimization (PPO) Evaluating the model's performance before and after training

About

GPT-2 Text Generation with Reinforcement Learning from Human Feedback (RLHF) Implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages