WirelessAgent++ automates the design of LLM-based autonomous agents for wireless network tasks.
It casts agent design as a program search problem and solves it with domain-adapted Monte Carlo Tree Search (MCTS),
autonomously discovering workflows that outperform hand-crafted baselines by up to 31%.
- Highlights
- System Overview
- MCTS Optimization
- WirelessBench
- Results
- Quick Start
- Project Structure
- Citation
- Acknowledgments
| π Automated Agent Design | No manual workflow engineering β MCTS automatically discovers optimal operator compositions, prompt strategies, and tool-calling patterns |
| π οΈ Domain-Specific Tool Integration | ReAct-based ToolAgent and deterministic CodeLevel operators seamlessly integrate ray-tracing predictors, Kalman filters, and telecom calculators |
| π WirelessBench Suite | 3,392 problems across 3 dimensions: knowledge reasoning (WCHW), network slicing (WCNS), and mobile service assurance (WCMSA) |
| π° Ultra-Low Cost | Full optimization search costs < $5 per task; per-problem inference costs < $0.001 |
| π State-of-the-Art | Outperforms prompting baselines by up to 31 pp and general-purpose workflow optimizers by 11.9 pp |
Left: In WirelessAgent, a human expert iteratively designs a fixed agentic workflow through multi-round dialogue with an LLM.
Right: In WirelessAgent++, an Optimizer LLM (Claude-Opus-4.5) jointly searches over workflow structures and tool-calling strategies via MCTS; the resulting workflow is executed by an Executor LLM (Qwen-turbo-latest) on WirelessBench, automatically producing distinct, task-adaptive workflows without manual engineering.
| Operator | Description |
|---|---|
Custom(x, p) |
Invokes an LLM with input x and instruction prompt p |
ToolAgent(x, s) |
ReAct-based agent that interleaves reasoning and tool calls for up to s iterations |
CodeLevel(x, f) |
LLM-free deterministic tool execution β zero variance, near-zero cost |
ScEnsemble({yα΅’}, x) |
Self-consistency voting across multiple candidate answers |
AnswerGenerate(x) |
Structured final answer production |
π‘ Key Finding: The MCTS optimizer autonomously discovers that
ToolAgent-based workflows can be compiled into more efficientCodeLevelpipelines, effectively removing the LLM from the tool-calling path while maintaining accuracy.
WirelessAgent++ employs three domain-aware enhancements to the standard MCTS algorithm:
-
Penalized Boltzmann Selection β Prevents the optimizer from being trapped in poorly performing subtrees by applying a temperature-controlled penalty to already-visited nodes.
-
Maturity-Aware Heuristic Critic β A lightweight LLM pre-screens proposed mutations before expensive evaluation, filtering out obviously poor candidates.
-
3-Class Experience Replay β Classifies mutation outcomes as Success, Neutral, or Failure (rather than binary), preventing noise-induced fluctuations from corrupting the experience buffer.
- 19 rounds of optimization, $4.95 total search cost, ~63 min wall-clock time
- Score improved from 62.44% (Round 1, seed) to 81.78% (Round 14, best) β a +30.97% improvement
- The optimizer discovers tool integration (Round 2, +18.42 pp) as the single largest gain
WirelessBench is a standardized, multi-dimensional benchmark suite for evaluating LLM agents on wireless communication tasks:
| Benchmark | Problems | Val / Test | Task Type | Key Challenge |
|---|---|---|---|---|
| WCHW | 1,392 | 348 / 1,044 | Knowledge Reasoning | Multi-step formula application, unit conversion |
| WCNS | 1,000 | 250 / 750 | Code + Tool Use | Ray-tracing CQI prediction β bandwidth allocation |
| WCMSA | 1,000 | 250 / 750 | Multi-Step Decision | Kalman prediction β CQI estimation β QoS assurance |
- Data Collection β Seed problems from wireless textbooks (Goldsmith, Molisch) and 3GPP/IEEE standards
- Psychometric Data Cleaning β 10-LLM funnel pipeline with item-total correlation, Mokken scale analysis, and inter-item consistency
- LLM-Based Augmentation β Parameter variation, bidirectional conversion, cross-topic integration (LLMs generate problem text only; all ground truths computed by deterministic solvers)
- Human Validation β Graduate-student verification of every problem
WCNS (Network Slicing) β 3-Phase Trajectory:
- Phase 1 β Seed (61.3%): Bare LLM call, CQI prediction essentially random
- Phase 2 β Tool Discovery (90.5%, +29.2 pp):
ToolAgentdiscovers ray-tracing tool - Phase 3 β Tool Compilation (92.18%, +1.7 pp):
ToolAgentβCodeLevelRayTracing(deterministic, LLM-free)
WCMSA (Mobile Service Assurance) β 3-Phase Trajectory:
- Phase 1 β Seed (65.76%): No position prediction or channel estimation
- Phase 2 β Multi-Tool Discovery (93.59%, +27.83 pp):
ToolAgentchains Kalman filter β ray-tracing - Phase 3 β Tool Compilation (96.89%, +1.35 pp): Compiled into
CodeLevelKalmanPredictorβCodeLevelRayTracing
| Method | HotpotQA (F1) | DROP (F1) | MATH (Acc) | WirelessBench |
|---|---|---|---|---|
| Qwen-turbo (Zero-shot) | 0.3754 | 0.5764 | 0.7550 | 0.5244 |
| CoT | 0.5261 | 0.5893 | 0.7737 | 0.5244 |
| MedPrompt | 0.5099 | 0.6031 | 0.6833 | 0.5244 |
| ADAS | 0.6108 | 0.6102 | 0.7697 | 0.5244 |
| AFlow | 0.6818 | 0.7788 | 0.8103 | 0.6992 |
| WirelessAgent++ | 0.7273 | 0.8021 | 0.8210 | 0.8102 |
| WCHW | WCNS | WCMSA | |
|---|---|---|---|
| Search Rounds | 19 | 11 | 11 |
| Wall-Clock Time | 63 min | 13 min | 14 min |
| Total Search Cost | $4.95 | $0.99 | $1.05 |
| Per-Problem Inference | $0.00083 | $0.00056 | $0.00068 |
# Clone the repository
git clone https://github.com/jwentong/WirelessAgent-R2.git
cd WirelessAgent-R2
# Create conda environment
conda create -n wirelessagent python=3.9
conda activate wirelessagent
# Install dependencies
pip install -r requirements.txtCopy the example config and fill in your API keys:
cp config/config2.example.yaml config/config2.yamlEdit config/config2.yaml with your LLM API credentials:
models:
"Claude-Opus-4.5":
api_type: "openai"
base_url: "<your_base_url>"
api_key: "<your_api_key>"
temperature: 0
"qwen-turbo-latest":
api_type: "openai"
base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
api_key: "<your_api_key>"
temperature: 0python data/download_data.py# Run on WirelessBench benchmarks
python run.py --dataset WCHW --max_rounds 20
python run.py --dataset WCNS --max_rounds 15
python run.py --dataset WCMSA --max_rounds 15
# Run on general NLP benchmarks
python run.py --dataset MATH --max_rounds 20
python run.py --dataset HotpotQA --max_rounds 20| Argument | Default | Description |
|---|---|---|
--dataset |
(required) | Benchmark name (WCHW / WCNS / WCMSA / MATH / HotpotQA / DROP / ...) |
--sample |
4 | Number of workflows to resample per round |
--max_rounds |
20 | Maximum MCTS optimization rounds |
--initial_round |
1 | Starting round number |
--check_convergence |
False | Enable early stopping |
--validation_rounds |
5 | Validation runs per candidate evaluation |
--optimized_path |
auto | Path to save optimized workflows |
WirelessAgent-R2/
βββ run.py # Main entry point
βββ requirements.txt # Python dependencies
βββ config/
β βββ config2.example.yaml # API config template
β βββ config2.yaml # Your API keys (gitignored)
βββ benchmarks/
β βββ benchmark.py # Base benchmark class
β βββ wchw.py # WCHW benchmark
β βββ wchw_enhanced.py # Enhanced WCHW with psychometric cleaning
β βββ wcns.py # WCNS benchmark (network slicing)
β βββ wcmsa.py # WCMSA benchmark (mobile service)
β βββ hotpotqa.py / drop.py / ... # General NLP benchmarks
β βββ utils.py # Benchmark utilities
βββ scripts/
β βββ optimizer.py # MCTS optimizer core
β βββ operators.py # Operator definitions (Custom, ToolAgent, CodeLevel, ...)
β βββ workflow.py # Workflow execution engine
β βββ evaluator.py # Evaluation pipeline
β βββ wireless_tools.py # Domain tools (ray-tracing, Kalman filter)
β βββ enhanced_tools.py # Extended tool library
β βββ tools.py # General tool utilities
β βββ async_llm.py # Async LLM API wrapper
β βββ optimizer_utils/ # MCTS utilities (selection, critic, experience)
β βββ prompts/ # Prompt templates
β βββ rag/ # RAG retriever for telecom knowledge
β βββ telecom_tools/ # Telecom-specific tools
β βββ utils/ # General utilities
βββ data/
β βββ download_data.py # Dataset downloader
β βββ maps/ # HKUST campus ray-tracing maps (.osm)
β βββ datasets/ # Downloaded benchmark data (gitignored)
β βββ Textbooks/ # Reference textbooks (gitignored)
βββ assets/ # Images for README
βββ figures/ # Generated analysis figures
βββ workspace/ # Optimization outputs (gitignored)
If you find WirelessAgent++ useful in your research, please cite our papers:
@article{tong2026wirelessagentplus,
title = {WirelessAgent++: Automated Agentic Workflow Design and Benchmarking for Wireless Networks},
author = {Tong, Jingwen and Li, Zijian and Liu Fang and Guo, Wei and Zhang, Jun},
journal = {arXiv preprint arXiv:2603.00501v1},
year = {2026},
}
@article{tong2025wirelessagent,
title={WirelessAgent: Large language model agents for intelligent wireless networks},
author={Tong, Jingwen and Guo, Wei and Shao, Jiawei and Wu, Qiong and Li, Zijian and Lin, Zehong and Zhang, Jun},
journal={arXiv preprint arXiv:2505.01074},
year={2025}
}WirelessAgent++ builds upon the excellent AFlow framework. We thank the AFlow team for their pioneering work on MCTS-based workflow optimization.
This work was supported by the Hong Kong University of Science and Technology (HKUST).
From "building agents" to "building agent builders" for next-generation wireless networks.




