This repository contains the source code accompanying our paper,
"Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools".
The code implements a reinforcement learning framework for optimizing wafer transfer scheduling in semiconductor manufacturing equipment, specifically in cluster tools involving multiple robots. It leverages an autoregressive policy structure combined with action masking to scale learning across complex, discrete action spaces. The framework is designed to be extensible and includes a configurable simulator supporting various tool configurations.
The implementation supports two representative environments: a radial-type and a track-type cluster tool. Simulation configurations are defined in JSON format, allowing users to easily specify tool layout, robot parameters, action sets, reward weights, and deadlock conditions.
This code is intended for researchers, engineers, and students interested in reinforcement learning, scheduling, and industrial automation.
The licensing and usage of this code follow the terms described in the accompanying LICENSE file.
Simulation environments are defined via JSON files. Refer to the sample files under ./config/env/ for structure and examples.
Contains general simulation settings such as episode length, reward weights, and time resolution.
"args": {
"episode_done_time_sec": 1000,
"timestep_interval_sec": 1.0,
"init_full_wafer": 1,
"wafer_count": 1000,
"reward_functions": {
"reward_when_each_wafer_done": 1,
"reward_wafer_progressing": 0.01,
"penalty_idle_move": 0.0000
}
}Defines tool modules and their properties, e.g., type, position, and processing time.
Defines robot properties such as arm count, speed, and initial position.
Maps numeric waypoints to module groups. Wildcards (*) are supported for auto-matching.
Defines available actions for each robot. Wildcards supported for pattern-based matching.
Defines conditions under which deadlock occurs based on robot state and module availability.
.
├── src/
│ ├── run.sbx.py # Main training entry point
│ └── sim_env.py # Environment definition
│
├── config/
│ ├── env/
│ │ ├── radial_type/
│ │ │ └── simulator.json
│ │ └── track_type/
│ │ └── simulator.json
│ └── model/
│ └── config.yaml # PPO hyperparameters
│
├── StepLogViewer/ # C# Gantt Chart UI (Windows executable)
│ # Visualizes step logs as shown in the paper
│
├── output_data/ # Logged results after training
│ └── [wandb_project_name]/[run_name]/episode_log/
│ # Contains Gantt chart step logs recorded based on best reward
│ # These can be viewed with StepLogViewer for training analysis
│
├── wandb_api.key.txt # Your wandb API key (user-provided)
└── LICENSE
- Set up your Weights & Biases API key:
Save your key in a file:
./wandb_api.key.txt
- Run training with wandb logging:
python ./src/run.sbx.py ./config/model/config.yaml \
--simulator_path ./config/env/[radial_type|track_type]/simulator.json \
--wandb_in_model \
--wandb_project_name [your_wandb_project] \
--wandb_graph_name [run_name]After execution, output logs are saved under:
./output_data/[wandb_project_name]/[run_name]/episode_log/
These contain step-by-step logs used for the Gantt chart visualization in the paper. You can view these files using the StepLogViewer tool.
If you use this code in your work, please cite our paper:
S. -H. Cho, J. S. B. Choe and J. -K. Kim, "Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools," 2025 IEEE 34th International Symposium on Industrial Electronics (ISIE), Toronto, ON, Canada, 2025, pp. 1-8, doi: 10.1109/ISIE62713.2025.11124688. keywords: {Industrial electronics;Job shop scheduling;Robot kinematics;Decision making;Throughput;Deep reinforcement learning;Space exploration},
@INPROCEEDINGS{11124688,
author={Cho, Soo-Hwan and Choe, Jean Seong Bjorn and Kim, Jong-Kook},
booktitle={2025 IEEE 34th International Symposium on Industrial Electronics (ISIE)},
title={Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools},
year={2025},
volume={},
number={},
pages={1-8},
keywords={Industrial electronics;Job shop scheduling;Robot kinematics;Decision making;Throughput;Deep reinforcement learning;Space exploration},
doi={10.1109/ISIE62713.2025.11124688}}
This code was developed as part of the research presented in the paper
"Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools".
For questions or collaboration inquiries, please contact:
Soo-Hwan Cho – soohwancho@korea.ac.kr
Affiliation: Korea University, High Performance Intelligence Computing (HPIC) Lab
This project is licensed under the terms described in the LICENSE file.