2D inverted pendulum balanced by 2 reaction wheels. The controller learns via Reinforcement Learning in simulation and runs the trained policy on-device. ROS2 and ZephyrOS running on Arduino UNO Q.
The hypothesis is that a machine-learning trained in simulation with randomized parameters will generalize and adapt itself to the real-world system despite differences in motors and mass distribution, whereas a PID controller would require manual retuning.
| Component | Part |
|---|---|
| MCU board | Arduino UNO Q (STM32U585 @ 160 MHz + QRB2210 Linux SBC) |
| IMU | MPU6050 |
| Motor | NEMA 17 stepper (17PM-K502-G6WM) |
| Driver | A4988 |
| Power | 12β24 V for motor, 3.3 V logic from board |
Steppers are bad for this but this is what I had on hand. Use BLCD with encoders if you can.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Zephyr RTOS (STM32U585) β
β β
β thread_sensors βββΊ k_msgq βββΊ thread_control β
β β β β
β MPU6050 (I2C) STEP/DIR GPIO β
β Complementary filter (A4988 microstepping) β
β β β
β thread_comms βββββ k_msgq βββββββββββ β
β β β
β simple binary UART frames β
βββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β LPUART1 β /dev/ttyHS1
βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β QRB2210 Linux SBC (on-board) β ROS 2 Jazzy β
β serial_bridge_node reads /dev/ttyHS1 β
β βββΊ /pendulum/state (local DDS) β
β webots_relay.py βββΊ TCP client β 127.0.0.1:9001 β
βββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β TCP over USB (adb reverse tcp:9001)
βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β Windows β Webots R2025a + Python β
β TCP server :9001 βββΊ Digital Twin β
β RL Training (Stable Baselines3) βββΊ policy.h β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Transport. The QRB2210 runs native ROS 2, so
webots_relay.pyruns on the board, subscribes to/pendulum/stateover local DDS, and pushes the twin packet to Webots over a TCP-over-USB tunnel (adb reverse). No network IP, no WSL2. The earlier WiFi + WSL2 relay path is deprecated β see docs/deprecated_wifi_wsl2.md. WSL2 + the FastDDS TCP profiles remain only as an optional RViz2/visualisation path.
ROS2 runs natively on the QRB2210. The STM32 sends lightweight binary frames over UART; a Python node on the QRB2210 parses them and publishes topics. Details: docs/serial_communication.md
| Thread | Priority | Stack | Role |
|---|---|---|---|
thread_sensors |
2 | 2048 B | Read MPU6050, apply complementary filter |
thread_control |
1 | 2048 B | PID / RL policy β compute step frequency |
thread_comms |
3 | 2048 B | Send state as binary UART frames to QRB2210 |
firmware/
βββ zephyr/
βββ blinky/ # hardware verification sample
βββ reaction_wheel/ # main application
βββ CMakeLists.txt
βββ prj.conf
βββ boards/
β βββ arduino_uno_q.conf
βββ src/
βββ app.h # shared types, message queue externs
βββ main.c # queue definitions, main()
βββ sensors.c/h # IMU read + filter
βββ control.c/h # PID / policy inference
βββ comms.c/h # binary UART frames to QRB2210
docs/
βββ arduino_uno_q_zephyr.md # board setup, flashing, peripherals
βββ qrb2210_ros2_setup.md # ROS2 install + startup procedure
βββ serial_communication.md # arduino-router investigation, RTT solution
βββ deprecated_wifi_wsl2.md # old WiFi + WSL2 relay architecture (reference)
ros2_ws/
βββ serial_bridge_node.py # reads /dev/ttyHS1, publishes /pendulum/state
βββ webots_relay.py # (on board) /pendulum/state β TCP-over-USB to Webots
βββ wsl2_relay.py # DEPRECATED (WiFi + WSL2 + UDP)
βββ fastdds_tcp_server.xml # DEPRECATED (FastDDS TCP, WiFi era)
βββ fastdds_tcp_client.xml # DEPRECATED (FastDDS TCP, WiFi era)
scripts/
βββ flash.ps1 # Ctrl+Shift+B flash via ADB + GDB
βββ rtt_monitor.ps1 # RTT console (printk output over SWD)
simulation/
βββ webots/
βββ worlds/pendulum.wbt # 2-wheel free-standing pendulum on a table
βββ controllers/pendulum_controller/ # SIMULATION (physics) + DIGITAL_TWIN (display)
training/
βββ rl/ # Stable Baselines3 PPO training
| Tool | Version | Notes |
|---|---|---|
| nRF Connect for VS Code | v3.3.0 | includes Zephyr SDK, west |
| ADB Platform Tools | latest | C:\tools\platform-tools\ in PATH |
| WSL2 Ubuntu 24.04 | β | ROS2 Jazzy + micro_ros_agent |
| Webots | R2025a | Windows native |
| Python | 3.11 | stable-baselines3, gymnasium |
Flashing uses QRB2210 (Linux SoC on the board) as a debug bridge β no ST-LINK needed.
# Ctrl+Shift+B in VS Code, or manually:
.\scripts\flash.ps1See docs/arduino_uno_q_zephyr.md for full setup details.
Arduino UNO Q routes the STM32 UART through arduino-router, a Linux daemon that speaks MessagePack RPC β not raw serial. PuTTY, PowerShell SerialPort, and Arduino IDE Serial Monitor all show nothing because they expect raw bytes.
The correct solution is Segger RTT over the SWD debug connection (same path used for flashing):
# Runs in VS Code Terminal β "RTT Monitor" task (Ctrl+Shift+B β pick task)
.\scripts\rtt_monitor.ps1The script kills any stale openocd, starts arduino-debug on the board, configures RTT via OpenOCD telnet (port 4444), and streams printk output from port 9090.
Required prj.conf entries:
CONFIG_USE_SEGGER_RTT=y
CONFIG_RTT_CONSOLE=y
CONFIG_UART_CONSOLE=n # must disable UART, otherwise it takes priority and RTT gets 0 bytesSee docs/serial_communication.md for the full investigation.
Open firmware/zephyr/reaction_wheel in nRF Connect for VS Code, add build configuration for arduino_uno_q/stm32u585xx, then Build + Flash.
- nRF Connect SDK v3.3.0 + Zephyr 4.3.99 installed
- hal_stm32 module cloned (
C:\ncs\v3.3.0\modules\hal\stm32, reve05bb47) - ADB flash procedure working (QRB2210 as debug bridge)
- VS Code Ctrl+Shift+B flash task configured
- RTT console working (
scripts/rtt_monitor.ps1) - WSL2 Ubuntu 24.04 installed, ROS2 Jazzy installed
-
reaction_wheelZephyr project scaffolding - Three-thread architecture with message queues
- PID controller skeleton (
control.c) - Mock sinusoidal sensor data (
sensors.c) - printk telemetry (
comms.c) - Build passes on
arduino_uno_q/stm32u585xx - Flash and verify serial output on real hardware (RTT:
angle=X.XX vel=Y.YYat 100 Hz confirmed)
- Define binary UART frame format (packed floats, magic header, CRC)
-
thread_commssends binary frames over LPUART1 β/dev/ttyHS1(replace printk) -
serial_bridge_node.pyon QRB2210: reads/dev/ttyHS1, publishessensor_msgs/Imu - Webots world: 2-wheel free-standing pendulum on a silicone tip (genuine physics)
- Webots controller:
DIGITAL_TWIN(kinematic display) +SIMULATION(free physics) modes -
WiFi + WSL2 relay + FastDDS TCPβ migrated to TCP-over-USB, relay on board (webots_relay.py) - End-to-end test: STM32 fake sine β
serial_bridge_nodeβwebots_relayβ USB β Webots twin tilts - Extend frame/relay to carry roll + both wheel speeds (full 2-axis twin)
- Custom
gymnasium.Envwrapping a Webots extern controller (training/rl/pendulum_env.py) - PPO training script with curriculum AβBβC (
training/rl/train.py)- A: nominal domain, low IMU noise β learn sensorβcontrol mapping
- B: full MPU6050 noise (
imu_noise.py) - C: domain randomization (CoM offset, mass, inertia, torque, friction, IMU, latency)
- Tiny MLP actor (6β64β64β2 tanh), fixed obs scales, no BatchNorm/running-norm
- Both tilt axes trained together (no single-axis stage)
- Policy export: actor β TFLite int8 β
policy.h(CMSIS-NN on STM32 M33)
- MPU6050 wired + DTS overlay for i2c4
-
sensors.cβ replace mock with real MPU6050 reads - A4988 wired (STEP/DIR GPIO, motor power supply)
-
control.cβ Zephyr timer callback drives STEP pin - Mechanical assembly: arm, wheel, pivot
-
policy.hlinked into Zephyr build -
thread_controlusespolicy_infer()(CMSIS-NN int8) instead ofpid() - Inference runs on STM32U585 M33 inside the control loop β no bridge latency
- Pendulum balances on real hardware with the sim-trained PPO policy
- STM32 runs the deployed PPO actor at full rate and logs transitions (obs, action, reward, next-obs) up to the QRB2210
- Collect the logs into a dataset (real motors / real mass distribution)
- Train SAC offline / batch against the dataset β off the robot, no online trainer in the loop
- Keep a conservative / behavior-cloning term (offline RL β avoid actions the data never covers)
- Re-quantize and re-flash the fine-tuned
policy.h; closes the sim-to-real gap without manual PID retuning