Skip to content

RL-Trained LLM for End-to-End Data Recipe Generation #1760

@arhamm1

Description

@arhamm1

What:
Add integration with DataChef (arXiv:2602.11089, Feb 2026) — a 32B LLM trained via RL to generate complete end-to-end NeMo Curator pipeline specifications (synthesis strategy, filter chain, mixing ratios) given a target benchmark and base model. Exposes a DataChefRecipeGenerator that outputs a valid NeMo Curator config YAML.

Why:
DataChef achieves 66.7 on AIME'25 for a Qwen3-1.7B math-adapted model — surpassing the official Qwen3 post-training checkpoint for the same base model. It matches human expert curation across 6 held-out tasks. The RL-trained recipe generator eliminates the manual trial-and-error of pipeline design, which is the primary bottleneck in practice.

Definition of Done:

  • DataChefRecipeGenerator under nemo_curator/recipe/
  • Interface: accepts target_benchmark: str, base_model_id: str, available_data_sources: List[str], compute_budget_tokens: int
  • Calls DataChef API (hosted or local) with structured prompt encoding the above
  • Parses DataChef output into a valid NeMo Curator pipeline config YAML
  • Config validation: runs a dry-run of the generated pipeline on 1M token sample before full execution
  • Proxy reward integration: evaluates generated recipe quality on a fast proxy before committing to full run
  • Fallback: if DataChef unavailable, outputs a best-practice template config for the domain
  • Tutorial: generate and execute a math-specialization recipe using DataChef → NeMo Curator pipeline
  • Integration test: generated YAML is parseable and passes NeMo Curator config validation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions