A collection of Crazyflie reinforcement learning environments built on Isaac Lab, using a self-contained cascaded PID inner-loop controller.
CrazyPlayGround provides Isaac Lab environments for training RL agents on a simulated Crazyflie 2.1. Instead of letting the RL agent directly command motor thrusts, a cascaded firmware-style PID controller (position β velocity β attitude β rate) runs at 500 Hz as the inner loop. The RL agent operates at a higher level (100 Hz), commanding position deltas, velocity references, or attitude setpoints depending on the environment.
RL agent (100 Hz)
ββ sets target (pos delta / vel ref / attitude)
ββ _apply_action() Γ 5 (500 Hz, each physics step)
ββ Cascade PID β thrust [N] + moment [NΒ·m]
ββ applied to Crazyflie rigid body via Isaac Lab
| Loop | Rate | Input β Output |
|---|---|---|
| Position | 100 Hz | position error [m] β velocity setpoint [m/s] |
| Velocity | 100 Hz | velocity error [m/s] β roll/pitch command [rad] + thrust Ξ |
| Attitude | 500 Hz | attitude error [rad] β body-rate setpoint [rad/s] |
| Rate | 500 Hz | rate error [rad/s] β moment [NΒ·m] |
| Parameter | Value |
|---|---|
Physics timestep (dt) |
1/500 s = 2 ms |
| Decimation | 5 |
| Policy rate | 100 Hz |
| Gyro LPF cutoff | 20 Hz |
Three variants differing only in the abstraction level of the RL action space:
| Task ID | Action | Action space |
|---|---|---|
Pos-Hovering |
Position delta [dx, dy, dz] (m), clamped to Β±0.1 |
3 |
Vel-Hovering |
Velocity reference [Vx, Vy, Vz] (m/s), scaled by max_velocity=1.0 |
3 |
Att-Hovering |
[roll, pitch, yaw_rate, thrust_normalized] |
4 |
All three share the same observation space (dim=6):
[lin_vel_b (3), desired_pos_b (3)]
lin_vel_b: linear velocity in body frame [m/s]desired_pos_b: goal position expressed in body frame [m]
And the same reward:
r = - lin_vel_scale Γ ||Ο_lin||Β²
- ang_vel_scale Γ ||Ο_ang||Β²
+ distance_scale Γ (1 - tanh(||pos - goal|| / 0.8))
Episode terminates when the drone goes below 0.1 m or above 2.0 m.
| Task ID | Description |
|---|---|
Template-Crazyplayground-Marl-Direct-v0 |
Multi-agent collaborative task |
- Isaac Lab (Isaac Sim 4.5+)
1. Install Isaac Lab following the official guide.
2. Install CrazyPlayGround (editable):
pip install -e source/CrazyPlayGround3. Verify β list available environments:
python scripts/list_envs.pyExpected output includes Vel-Hovering, Pos-Hovering, Att-Hovering, and Template-Crazyplayground-Marl-Direct-v0.
# SKRL (PPO)
python scripts/skrl/train.py --task=Vel-Hovering --num_envs=4096
# RSL-RL
python scripts/rsl_rl/train.py --task=Pos-Hovering --num_envs=4096
# Stable Baselines 3
python scripts/sb3/train.py --task=Att-Hovering --num_envs=512python scripts/skrl/play.py --task=Vel-Hovering --num_envs=16python scripts/zero_agent.py --task=Vel-Hovering
python scripts/random_agent.py --task=Vel-HoveringCrazyPlayGround/
βββ source/CrazyPlayGround/CrazyPlayGround/
β βββ controllers/ # Self-contained cascade PID controller
β β βββ cascade_pid.py
β β βββ pid.py
β β βββ config.py
β β βββ crazyflie.yaml # PID gains & physics params
β β βββ utils/math_utils.py
β βββ tasks/direct/
β βββ hovering/ # Single-drone envs
β β βββ pos_hovering.py
β β βββ vel_hovering.py
β β βββ att_hovering.py
β β βββ agents/
β βββ track/
β βββ drone_racing/
β βββ drone_racing_marl/
β βββ formation/
β βββ fly_through/
β βββ teleoperation/
βββ scripts/
βββ skrl/
βββ rsl_rl/
βββ sb3/
βββ random_agent.py
PID gains and simulation parameters are in source/CrazyPlayGround/CrazyPlayGround/controllers/crazyflie.yaml under the controllers.cascade_pid section. Key parameters:
controllers:
cascade_pid:
sim_rate_hz: 500.0
pid_posvel_loop_rate_hz: 100.0
pid_loop_rate_hz: 500.0
gyro_lpf_cutoff_hz: 20.0
pos_kp: [2.0, 2.0, 2.0]
vel_kp: [25.0, 25.0, 25.0]
att_kp: [6.0, 6.0, 6.0]
rate_kp: [250.0, 250.0, 120.0]