Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools

This repository contains the source code accompanying our paper,
"Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools".

The code implements a reinforcement learning framework for optimizing wafer transfer scheduling in semiconductor manufacturing equipment, specifically in cluster tools involving multiple robots. It leverages an autoregressive policy structure combined with action masking to scale learning across complex, discrete action spaces. The framework is designed to be extensible and includes a configurable simulator supporting various tool configurations.

The implementation supports two representative environments: a radial-type and a track-type cluster tool. Simulation configurations are defined in JSON format, allowing users to easily specify tool layout, robot parameters, action sets, reward weights, and deadlock conditions.

This code is intended for researchers, engineers, and students interested in reinforcement learning, scheduling, and industrial automation.
The licensing and usage of this code follow the terms described in the accompanying LICENSE file.

Simulator Configuration Format

Simulation environments are defined via JSON files. Refer to the sample files under ./config/env/ for structure and examples.

(1) `args` – Simulator parameters

Contains general simulation settings such as episode length, reward weights, and time resolution.

"args": {
  "episode_done_time_sec": 1000,
  "timestep_interval_sec": 1.0,
  "init_full_wafer": 1,
  "wafer_count": 1000,
  "reward_functions": {
    "reward_when_each_wafer_done": 1,
    "reward_wafer_progressing": 0.01,
    "penalty_idle_move": 0.0000
  }
}

(2) `unit_list` – Processing modules

Defines tool modules and their properties, e.g., type, position, and processing time.

(3) `transport_robot_list` – Robots

Defines robot properties such as arm count, speed, and initial position.

(4) `waypoint_list` – Waypoint mapping

Maps numeric waypoints to module groups. Wildcards (*) are supported for auto-matching.

(5) `action_list` – Robot action sets

Defines available actions for each robot. Wildcards supported for pattern-based matching.

(6) `deadlock` – Deadlock conditions

Defines conditions under which deadlock occurs based on robot state and module availability.

Folder Structure

.
├── src/
│   ├── run.sbx.py         # Main training entry point
│   └── sim_env.py         # Environment definition
│
├── config/
│   ├── env/
│   │   ├── radial_type/
│   │   │   └── simulator.json
│   │   └── track_type/
│   │       └── simulator.json
│   └── model/
│       └── config.yaml    # PPO hyperparameters
│
├── StepLogViewer/         # C# Gantt Chart UI (Windows executable)
│                           # Visualizes step logs as shown in the paper
│
├── output_data/           # Logged results after training
│   └── [wandb_project_name]/[run_name]/episode_log/
│                           # Contains Gantt chart step logs recorded based on best reward
│                           # These can be viewed with StepLogViewer for training analysis
│
├── wandb_api.key.txt      # Your wandb API key (user-provided)
└── LICENSE

Running the Code

Set up your Weights & Biases API key:

Save your key in a file:

./wandb_api.key.txt

Run training with wandb logging:

python ./src/run.sbx.py ./config/model/config.yaml \
  --simulator_path ./config/env/[radial_type|track_type]/simulator.json \
  --wandb_in_model \
  --wandb_project_name [your_wandb_project] \
  --wandb_graph_name [run_name]

After execution, output logs are saved under:

./output_data/[wandb_project_name]/[run_name]/episode_log/

These contain step-by-step logs used for the Gantt chart visualization in the paper. You can view these files using the StepLogViewer tool.

Citation

If you use this code in your work, please cite our paper:

S. -H. Cho, J. S. B. Choe and J. -K. Kim, "Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools," 2025 IEEE 34th International Symposium on Industrial Electronics (ISIE), Toronto, ON, Canada, 2025, pp. 1-8, doi: 10.1109/ISIE62713.2025.11124688. keywords: {Industrial electronics;Job shop scheduling;Robot kinematics;Decision making;Throughput;Deep reinforcement learning;Space exploration},

@INPROCEEDINGS{11124688,
  author={Cho, Soo-Hwan and Choe, Jean Seong Bjorn and Kim, Jong-Kook},
  booktitle={2025 IEEE 34th International Symposium on Industrial Electronics (ISIE)}, 
  title={Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools}, 
  year={2025},
  volume={},
  number={},
  pages={1-8},
  keywords={Industrial electronics;Job shop scheduling;Robot kinematics;Decision making;Throughput;Deep reinforcement learning;Space exploration},
  doi={10.1109/ISIE62713.2025.11124688}}

Author

This code was developed as part of the research presented in the paper
"Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools".

For questions or collaboration inquiries, please contact:
Soo-Hwan Cho – soohwancho@korea.ac.kr
Affiliation: Korea University, High Performance Intelligence Computing (HPIC) Lab

License

This project is licensed under the terms described in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Paper		Paper
StepLogViewer		StepLogViewer
config		config
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gym_env.yaml		gym_env.yaml
wandb_api_key.txt		wandb_api_key.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools

Simulator Configuration Format

(1) `args` – Simulator parameters

(2) `unit_list` – Processing modules

(3) `transport_robot_list` – Robots

(4) `waypoint_list` – Waypoint mapping

(5) `action_list` – Robot action sets

(6) `deadlock` – Deadlock conditions

Folder Structure

Running the Code

Citation

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autoregressive DRL for Multi-Robot Scheduling in Semiconductor Cluster Tools

Simulator Configuration Format

(1) args – Simulator parameters

(2) unit_list – Processing modules

(3) transport_robot_list – Robots

(4) waypoint_list – Waypoint mapping

(5) action_list – Robot action sets

(6) deadlock – Deadlock conditions

Folder Structure

Running the Code

Citation

Author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

(1) `args` – Simulator parameters

(2) `unit_list` – Processing modules

(3) `transport_robot_list` – Robots

(4) `waypoint_list` – Waypoint mapping

(5) `action_list` – Robot action sets

(6) `deadlock` – Deadlock conditions

Packages