official github code for "SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing"
-
Updated
Apr 27, 2026 - Python
official github code for "SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing"
Group-relative Trajectory-based Policy Optimization: Increasing Quality and Training Stability
RL training environments with verifiable rewards for coding agents. Works with TRL, Unsloth, verl, OpenRLHF.
OpenEnv-based RL environment for training LLM agents in medical triage decision-making (ESI index) under partial observability. Uses GRPO (TRL) + Unsloth to optimize policies with multi-objective reward shaping (safety, accuracy, efficiency) and time-aware reasoning.
An OpenEnv RL environment where an LLM agent plays the buyer and negotiates against an LLM-powered seller over real marketplace listings.
This repository contains my personal notes and hands-on implementations for fine-tuning and post-training Large Language Models (LLMs).
A reinforcement learning fine-tuned model that generates Linux terminal commands from natural language descriptions. Trained using GRPO (Group Relative Policy Optimization) on a custom terminal task environment inspired by CAMEL-AI's SETA framework.
Add a description, image, and links to the grpo-training topic page so that developers can more easily learn about it.
To associate your repository with the grpo-training topic, visit your repo's landing page and select "manage topics."