Skip to content

ElkinStas/graph-generation-rl-task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Graph Generation with Network Topology Constraints

RL training task: Teach LLMs to generate graphs matching precise network topology metrics (clustering coefficient, average path length, connectivity).

Overview

This task teaches the model to work with network science and graph theory by generating graphs that satisfy strict topological constraints. The model must:

  1. Choose appropriate graph generation algorithms (Erdős-Rényi, Barabási-Albert, Holme-Kim triadic closure, etc.)
  2. Understand how different parameters affect network properties
  3. Use iterative refinement techniques (edge rewiring, pruning) to hit precise targets
  4. Balance multiple competing objectives under resource constraints

Why this matters for ML engineers: Graph generation with constraints is fundamental in many domains:

  • Synthetic network datasets for GNN training
  • Network topology optimization for distributed systems
  • Social network simulation and analysis
  • Biological network modeling (protein interactions, neural connectivity)

Task Description

Given a target profile for three key graph metrics:

  • C (Clustering Coefficient): Measures local triangle density, typically 0.08-0.12
  • L (Average Path Length): Average shortest path in the giant connected component, typically 4.2-6.0
  • CC (Connected Components): Must be exactly 1 (fully connected graph)

The model must generate a graph of 400 nodes that achieves a score ≥ 80-85% (randomly chosen per run).

Scoring Function

Score = 100 × average(closeness(C), closeness(L), closeness(CC))

where closeness(metric) = 1 - |actual - target| / tolerance

Hard constraint: If ANY metric achieves < 30% closeness, the total score is zero. This prevents the model from ignoring difficult metrics.

What the Model Learns

1. Graph Generation Algorithms

  • Triadic closure models (Holme-Kim): Best for achieving realistic C values through triangle formation
  • Barabási-Albert: Power-law degree distribution, but lower clustering
  • Erdős-Rényi: Uniform random, poor for this task (C too low)
  • Configuration model: Custom degree sequences

2. Network Topology Intuitions

  • m parameter: Higher m → denser graph → shorter L, higher C
  • Triadic probability p: Higher p → more triangles → higher C
  • Trade-offs: C and L are often anti-correlated (densification reduces distances)

3. Iterative Refinement

  • Standard rewiring: Preserves degree distribution, adjusts C/L
  • Triangle-boosting rewiring: Smart swaps that increase clustering
  • Edge pruning: Carefully remove edges to increase L while maintaining connectivity

4. Constraint Satisfaction Under Budgets

  • 120 rewire operations max
  • 60 metric computation calls max
  • 25 agent steps max
  • Penalty for excessive metric calls (>25)

5. Early Stopping Strategy

The task includes a "patience" mechanism: if the model shows no improvement for 6 consecutive checks after 10 attempts, it should abandon the current approach and try a different strategy.

Setup Instructions

1. Install uv (if not already installed)

Linux/macOS:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (PowerShell):

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Clone and Navigate

git clone https://github.com/ElkinStas/graph-generation-rl-task
cd graph-generation-rl-task

3. Set API Key

export ANTHROPIC_API_KEY='your_api_key_here'

4. Run the Task

uv run main.py

uv will automatically:

  • Create a virtual environment
  • Install dependencies (anthropic>=0.67.0, networkx>=3.0)
  • Execute the test suite

Running the Task

The script runs 10 independent trials by default. Each trial:

  1. Randomizes target metrics (C, L) and pass threshold
  2. Gives the model a unique seed for reproducibility
  3. Runs the agent loop with tool access
  4. Grades the final submission
  5. Logs detailed execution to logs/

To change number of runs, edit the last line in main.py:

asyncio.run(main(concurrent=False, num_runs=20))  # Run 20 trials

Execution modes:

  • concurrent=False: Sequential (easier to debug, recommended)
  • concurrent=True: Parallel execution (faster)

Grading Criteria

Correct answers are accepted if:

  • Score ≥ randomly chosen threshold (80-85%)
  • Graph is connected (CC = 1)
  • All metrics within tolerance bands

Common failure modes:

  • Using Erdős-Rényi (C too low, always fails)
  • Generating disconnected graphs (CC > 1)
  • Getting stuck in local optima (not exploring enough)
  • Excessive metric calls (penalty applied after 25 calls)
  • Early termination due to patience threshold

Task Difficulty & Pass Rate

Expected pass rate: 10-40% (per XOR requirements)

This is challenging because:

  1. Tight constraints: Each metric must be ≥30% close to target
  2. Multi-objective optimization: C and L trade off (harder to satisfy both)
  3. No direct analytical solution: Model must learn heuristics through trial-and-error
  4. Resource constraints: Limited rewire/metric budgets force efficiency
  5. Randomized targets: Each run has different C/L goals (generalization required)

Why models fail:

  • 40%: Wrong graph generator choice (e.g., ER instead of triadic)
  • 30%: Poor parameter selection (m/p values)
  • 20%: Insufficient refinement iterations
  • 10%: Budget exhaustion or early stopping

Why models succeed:

  • Use triadic closure model (Holme-Kim) with appropriate m/p
  • Compute metrics immediately after generation
  • Apply targeted refinement (rewire_triangles for C, prune_edges for L)
  • Submit as soon as threshold reached (avoid over-optimization)

Tools Available

The model has access to 6 tools:

Tool Purpose Budget
generate_graph(kind, params, seed) Create initial graph Unlimited
rewire(k) Random edge swaps (1-5) 120 total
rewire_triangles(k) Clustering-aware swaps (1-5) 120 total
prune_edges(k) Remove edges safely (1-3) 120 total
compute_metrics() Get C, L, CC + score 60 calls
submit_graph() Final submission Once

Key design decisions:

  • Shared rewire budget: All rewire operations (rewire + rewire_triangles + prune_edges) share the 120-call budget
  • Auto-submission: When score ≥ threshold, graph is auto-submitted (no need to call submit explicitly)
  • Deterministic connectivity: When CC=1 is required, rewire operations that break connectivity are automatically rejected
  • Efficient L computation: Uses 22 sampled BFS sources (exact APSP would be O(n³) on 400 nodes)

Logs & Reproducibility

Every run generates:

  • Detailed log: logs/run_<id>_<timestamp>.log - step-by-step execution
  • Conversation history: logs/run_<id>_<timestamp>.conversation.json - full transcript (on failure)
  • Summary files:
    • logs/summary_<timestamp>.jsonl - Machine-readable results
    • logs/summary_<timestamp>.csv - Excel-friendly format

Seed logging: Each run logs its seed prominently for exact reproducibility.

Example Output

============================================================
Running 10 test iterations sequentially...
Target ranges: C=(0.08, 0.12), L=(4.2, 6.0)
Pass threshold range: (80, 85)
============================================================

Run 1: SUCCESS - score=82.3 (threshold=81.2), best=82.3
   Metrics: C=0.0987, L=5.124, CC=1, GCC_size=400
   Duration: 45.2s
   Log: logs/run_1_20250103_143022.log

Run 2: FAILURE - score=68.4 < 83.7, best=68.4
   Metrics: C=0.0654, L=5.892, CC=1, GCC_size=400
   C<30% (C=0.0654, target≈0.095±0.035, closeness=26.3%)
   Duration: 38.1s
   Log: logs/run_2_20250103_143101.log

[... 8 more runs ...]

============================================================
TEST RESULTS
============================================================
  Passed: 3/10
  Failed: 7/10
  Pass Rate: 30.0%
  Avg Score: 64.2
  Score Range: [0.0, 86.1]

Pass rate in target range (10-40%)

Task Statistics

  • Lines of code: ~850 (main task logic)
  • Dependencies: 2 (anthropic, networkx)
  • Avg inference cost: ~$0.50-1.00 per 10 runs (with Claude Sonnet 4)
  • Avg runtime: 40-60 seconds per run (sequential)

About

LLM agent training/evaluation task for graph generation with topology constraints, reward-style scoring, tool-use budgets, and iterative refinement.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages