Skip to content

jwentong/WirelessAgent-R2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

WirelessAgent++ System Overview

WirelessAgent++

Automated Agentic Workflow Design and Benchmarking for Wireless Networks

arXiv WirelessBench License Issues PRs

WirelessAgent++ automates the design of LLM-based autonomous agents for wireless network tasks.
It casts agent design as a program search problem and solves it with domain-adapted Monte Carlo Tree Search (MCTS),
autonomously discovering workflows that outperform hand-crafted baselines by up to 31%.


πŸ“‹ Table of Contents


✨ Highlights

πŸ”„ Automated Agent Design No manual workflow engineering β€” MCTS automatically discovers optimal operator compositions, prompt strategies, and tool-calling patterns
πŸ› οΈ Domain-Specific Tool Integration ReAct-based ToolAgent and deterministic CodeLevel operators seamlessly integrate ray-tracing predictors, Kalman filters, and telecom calculators
πŸ“Š WirelessBench Suite 3,392 problems across 3 dimensions: knowledge reasoning (WCHW), network slicing (WCNS), and mobile service assurance (WCMSA)
πŸ’° Ultra-Low Cost Full optimization search costs < $5 per task; per-problem inference costs < $0.001
πŸ† State-of-the-Art Outperforms prompting baselines by up to 31 pp and general-purpose workflow optimizers by 11.9 pp

πŸ—οΈ System Overview

From WirelessAgent (manual) to WirelessAgent++ (automated)

Left: In WirelessAgent, a human expert iteratively designs a fixed agentic workflow through multi-round dialogue with an LLM.
Right: In WirelessAgent++, an Optimizer LLM (Claude-Opus-4.5) jointly searches over workflow structures and tool-calling strategies via MCTS; the resulting workflow is executed by an Executor LLM (Qwen-turbo-latest) on WirelessBench, automatically producing distinct, task-adaptive workflows without manual engineering.

Key Operators

Operator Description
Custom(x, p) Invokes an LLM with input x and instruction prompt p
ToolAgent(x, s) ReAct-based agent that interleaves reasoning and tool calls for up to s iterations
CodeLevel(x, f) LLM-free deterministic tool execution β€” zero variance, near-zero cost
ScEnsemble({yα΅’}, x) Self-consistency voting across multiple candidate answers
AnswerGenerate(x) Structured final answer production

πŸ’‘ Key Finding: The MCTS optimizer autonomously discovers that ToolAgent-based workflows can be compiled into more efficient CodeLevel pipelines, effectively removing the LLM from the tool-calling path while maintaining accuracy.


🌲 MCTS Optimization

MCTS Architecture

WirelessAgent++ employs three domain-aware enhancements to the standard MCTS algorithm:

  1. Penalized Boltzmann Selection β€” Prevents the optimizer from being trapped in poorly performing subtrees by applying a temperature-controlled penalty to already-visited nodes.

  2. Maturity-Aware Heuristic Critic β€” A lightweight LLM pre-screens proposed mutations before expensive evaluation, filtering out obviously poor candidates.

  3. 3-Class Experience Replay β€” Classifies mutation outcomes as Success, Neutral, or Failure (rather than binary), preventing noise-induced fluctuations from corrupting the experience buffer.

Optimization Results

  • 19 rounds of optimization, $4.95 total search cost, ~63 min wall-clock time
  • Score improved from 62.44% (Round 1, seed) to 81.78% (Round 14, best) β€” a +30.97% improvement
  • The optimizer discovers tool integration (Round 2, +18.42 pp) as the single largest gain

πŸ† WirelessBench

WirelessBench is a standardized, multi-dimensional benchmark suite for evaluating LLM agents on wireless communication tasks:

Benchmark Problems Val / Test Task Type Key Challenge
WCHW 1,392 348 / 1,044 Knowledge Reasoning Multi-step formula application, unit conversion
WCNS 1,000 250 / 750 Code + Tool Use Ray-tracing CQI prediction β†’ bandwidth allocation
WCMSA 1,000 250 / 750 Multi-Step Decision Kalman prediction β†’ CQI estimation β†’ QoS assurance

Data Construction Pipeline

  1. Data Collection β€” Seed problems from wireless textbooks (Goldsmith, Molisch) and 3GPP/IEEE standards
  2. Psychometric Data Cleaning β€” 10-LLM funnel pipeline with item-total correlation, Mokken scale analysis, and inter-item consistency
  3. LLM-Based Augmentation β€” Parameter variation, bidirectional conversion, cross-topic integration (LLMs generate problem text only; all ground truths computed by deterministic solvers)
  4. Human Validation β€” Graduate-student verification of every problem

Workflow Evolution Examples

WCNS (Network Slicing) β€” 3-Phase Trajectory:

  • Phase 1 β€” Seed (61.3%): Bare LLM call, CQI prediction essentially random
  • Phase 2 β€” Tool Discovery (90.5%, +29.2 pp): ToolAgent discovers ray-tracing tool
  • Phase 3 β€” Tool Compilation (92.18%, +1.7 pp): ToolAgent β†’ CodeLevelRayTracing (deterministic, LLM-free)

WCMSA (Mobile Service Assurance) β€” 3-Phase Trajectory:

  • Phase 1 β€” Seed (65.76%): No position prediction or channel estimation
  • Phase 2 β€” Multi-Tool Discovery (93.59%, +27.83 pp): ToolAgent chains Kalman filter β†’ ray-tracing
  • Phase 3 β€” Tool Compilation (96.89%, +1.35 pp): Compiled into CodeLevelKalmanPredictor β†’ CodeLevelRayTracing

πŸ“ˆ Results

WCHW β€” Wireless Communication Homework

WCHW Method Comparison

WCNS β€” Wireless Communication Network Slicing

WCNS Full Comparison

WCMSA β€” Mobile Service Assurance

WCMSA Overall Score

Main Results

Method HotpotQA (F1) DROP (F1) MATH (Acc) WirelessBench
Qwen-turbo (Zero-shot) 0.3754 0.5764 0.7550 0.5244
CoT 0.5261 0.5893 0.7737 0.5244
MedPrompt 0.5099 0.6031 0.6833 0.5244
ADAS 0.6108 0.6102 0.7697 0.5244
AFlow 0.6818 0.7788 0.8103 0.6992
WirelessAgent++ 0.7273 0.8021 0.8210 0.8102

Search Cost

WCHW WCNS WCMSA
Search Rounds 19 11 11
Wall-Clock Time 63 min 13 min 14 min
Total Search Cost $4.95 $0.99 $1.05
Per-Problem Inference $0.00083 $0.00056 $0.00068

πŸš€ Quick Start

1. Environment Setup

# Clone the repository
git clone https://github.com/jwentong/WirelessAgent-R2.git
cd WirelessAgent-R2

# Create conda environment
conda create -n wirelessagent python=3.9
conda activate wirelessagent

# Install dependencies
pip install -r requirements.txt

2. Configure API Keys

Copy the example config and fill in your API keys:

cp config/config2.example.yaml config/config2.yaml

Edit config/config2.yaml with your LLM API credentials:

models:
  "Claude-Opus-4.5":
    api_type: "openai"
    base_url: "<your_base_url>"
    api_key: "<your_api_key>"
    temperature: 0
  "qwen-turbo-latest":
    api_type: "openai"
    base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    api_key: "<your_api_key>"
    temperature: 0

3. Download Datasets

python data/download_data.py

4. Run Optimization

# Run on WirelessBench benchmarks
python run.py --dataset WCHW --max_rounds 20
python run.py --dataset WCNS --max_rounds 15
python run.py --dataset WCMSA --max_rounds 15

# Run on general NLP benchmarks
python run.py --dataset MATH --max_rounds 20
python run.py --dataset HotpotQA --max_rounds 20

Command-Line Arguments

Argument Default Description
--dataset (required) Benchmark name (WCHW / WCNS / WCMSA / MATH / HotpotQA / DROP / ...)
--sample 4 Number of workflows to resample per round
--max_rounds 20 Maximum MCTS optimization rounds
--initial_round 1 Starting round number
--check_convergence False Enable early stopping
--validation_rounds 5 Validation runs per candidate evaluation
--optimized_path auto Path to save optimized workflows

πŸ“ Project Structure

WirelessAgent-R2/
β”œβ”€β”€ run.py                          # Main entry point
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ config2.example.yaml        # API config template
β”‚   └── config2.yaml                # Your API keys (gitignored)
β”œβ”€β”€ benchmarks/
β”‚   β”œβ”€β”€ benchmark.py                # Base benchmark class
β”‚   β”œβ”€β”€ wchw.py                     # WCHW benchmark
β”‚   β”œβ”€β”€ wchw_enhanced.py            # Enhanced WCHW with psychometric cleaning
β”‚   β”œβ”€β”€ wcns.py                     # WCNS benchmark (network slicing)
β”‚   β”œβ”€β”€ wcmsa.py                    # WCMSA benchmark (mobile service)
β”‚   β”œβ”€β”€ hotpotqa.py / drop.py / ... # General NLP benchmarks
β”‚   └── utils.py                    # Benchmark utilities
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ optimizer.py                # MCTS optimizer core
β”‚   β”œβ”€β”€ operators.py                # Operator definitions (Custom, ToolAgent, CodeLevel, ...)
β”‚   β”œβ”€β”€ workflow.py                 # Workflow execution engine
β”‚   β”œβ”€β”€ evaluator.py                # Evaluation pipeline
β”‚   β”œβ”€β”€ wireless_tools.py           # Domain tools (ray-tracing, Kalman filter)
β”‚   β”œβ”€β”€ enhanced_tools.py           # Extended tool library
β”‚   β”œβ”€β”€ tools.py                    # General tool utilities
β”‚   β”œβ”€β”€ async_llm.py                # Async LLM API wrapper
β”‚   β”œβ”€β”€ optimizer_utils/            # MCTS utilities (selection, critic, experience)
β”‚   β”œβ”€β”€ prompts/                    # Prompt templates
β”‚   β”œβ”€β”€ rag/                        # RAG retriever for telecom knowledge
β”‚   β”œβ”€β”€ telecom_tools/              # Telecom-specific tools
β”‚   └── utils/                      # General utilities
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ download_data.py            # Dataset downloader
β”‚   β”œβ”€β”€ maps/                       # HKUST campus ray-tracing maps (.osm)
β”‚   β”œβ”€β”€ datasets/                   # Downloaded benchmark data (gitignored)
β”‚   └── Textbooks/                  # Reference textbooks (gitignored)
β”œβ”€β”€ assets/                         # Images for README
β”œβ”€β”€ figures/                        # Generated analysis figures
└── workspace/                      # Optimization outputs (gitignored)

πŸ“– Citation

If you find WirelessAgent++ useful in your research, please cite our papers:

@article{tong2026wirelessagentplus,
  title     = {WirelessAgent++: Automated Agentic Workflow Design and Benchmarking for Wireless Networks},
  author    = {Tong, Jingwen and Li, Zijian and Liu Fang and Guo, Wei and Zhang, Jun},
  journal   = {arXiv preprint arXiv:2603.00501v1},
  year      = {2026},
}

@article{tong2025wirelessagent,
  title={WirelessAgent: Large language model agents for intelligent wireless networks},
  author={Tong, Jingwen and Guo, Wei and Shao, Jiawei and Wu, Qiong and Li, Zijian and Lin, Zehong and Zhang, Jun},
  journal={arXiv preprint arXiv:2505.01074},
  year={2025}
}

πŸ™ Acknowledgments

WirelessAgent++ builds upon the excellent AFlow framework. We thank the AFlow team for their pioneering work on MCTS-based workflow optimization.

This work was supported by the Hong Kong University of Science and Technology (HKUST).


From "building agents" to "building agent builders" for next-generation wireless networks.

About

WirelessAgent-R2 is an automated workflow optimization system that uses Monte Carlo Tree Search (MCTS) to iteratively discover and refine LLM-based problem-solving strategies for wireless communication tasks, achieving a 20% performance improvement from 0.583 to 0.78 accuracy through intelligent exploration of prompt designs and operator

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages