grpo-training

Here are 7 public repositories matching this topic...

vivoCameraResearch / SmartPhotoCrafter

official github code for "SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing"

Updated Apr 27, 2026
Python

winstonsmith1897 / GTPO

Star

Group-relative Trajectory-based Policy Optimization: Increasing Quality and Training Stability

reinforcement-learning reinforcement-learning-algorithms train fine post-training llm rlhf grpo-training

Updated Feb 23, 2026
Jupyter Notebook

DeepGym / deepgym

Star

RL training environments with verifiable rewards for coding agents. Works with TRL, Unsloth, verl, OpenRLHF.

python machine-learning reinforcement-learning deep-learning sandbox evaluation rl code-execution ai-agents daytona llm unsloth coding-agents grpo verifiable-rewards openrlhf reward-function grpo-training

Updated Apr 24, 2026
Python

Surya-Hariharan / triagerl-openenv

Star

OpenEnv-based RL environment for training LLM agents in medical triage decision-making (ESI index) under partial observability. Uses GRPO (TRL) + Unsloth to optimize policies with multi-objective reward shaping (safety, accuracy, efficiency) and time-aware reasoning.

esi medical-triage huggingface-transformers huggingface-spaces unsloth openenv grpo-training

Updated May 7, 2026
Python

Vidit-Ostwal / price-negotiation-rl-OpenEnv

Sponsor

Star

An OpenEnv RL environment where an LLM agent plays the buyer and negotiates against an LLM-powered seller over real marketplace listings.

python machine-learning reinforcement-learning rl rl-environment openenv grpo-training price-negotiator openenv-environment

Updated May 9, 2026
Python

injamul3798 / LLM-Fine-tuning-RL-Hands-on-Lab-code-Intro-to-Post-training

Star

This repository contains my personal notes and hands-on implementations for fine-tuning and post-training Large Language Models (LLMs).

reinforcement-learning post-training ppo finetuning-llms grpo-training

Updated May 1, 2026
Jupyter Notebook

safoura-banihashemi / qwen3-terminal-grpo

Star

A reinforcement learning fine-tuned model that generates Linux terminal commands from natural language descriptions. Trained using GRPO (Group Relative Policy Optimization) on a custom terminal task environment inspired by CAMEL-AI's SETA framework.

lora fine-tuning huggingface grpo-training

Updated May 10, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the grpo-training topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the grpo-training topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grpo-training

Here are 7 public repositories matching this topic...

vivoCameraResearch / SmartPhotoCrafter

winstonsmith1897 / GTPO

DeepGym / deepgym

Surya-Hariharan / triagerl-openenv

Vidit-Ostwal / price-negotiation-rl-OpenEnv

injamul3798 / LLM-Fine-tuning-RL-Hands-on-Lab-code-Intro-to-Post-training

safoura-banihashemi / qwen3-terminal-grpo

Improve this page

Add this topic to your repo