PROJECT IN PROGRESS!

Reaction Wheel Pendulum Balancer

2D inverted pendulum balanced by 2 reaction wheels. The controller learns via Reinforcement Learning in simulation and runs the trained policy on-device. ROS2 and ZephyrOS running on Arduino UNO Q.

The hypothesis is that a machine-learning trained in simulation with randomized parameters will generalize and adapt itself to the real-world system despite differences in motors and mass distribution, whereas a PID controller would require manual retuning.

Hardware

Component	Part
MCU board	Arduino UNO Q (STM32U585 @ 160 MHz + QRB2210 Linux SBC)
IMU	MPU6050
Motor	NEMA 17 stepper (17PM-K502-G6WM)
Driver	A4988
Power	12–24 V for motor, 3.3 V logic from board

Steppers are bad for this but this is what I had on hand. Use BLCD with encoders if you can.

Software Architecture

┌─────────────────────────────────────────────────────┐
│                 Zephyr RTOS (STM32U585)             │
│                                                     │
│  thread_sensors ──► k_msgq ──► thread_control       │
│       │                              │              │
│  MPU6050 (I2C)               STEP/DIR GPIO          │
│  Complementary filter        (A4988 microstepping)  │
│                                      │              │
│  thread_comms ◄──── k_msgq ◄─────────┘              │
│       │                                             │
│  simple binary UART frames                          │
└───────────────┬─────────────────────────────────────┘
                │ LPUART1 → /dev/ttyHS1
┌───────────────▼──────────────────────────────────────┐
│         QRB2210 Linux SBC (on-board) — ROS 2 Jazzy   │
│  serial_bridge_node  reads /dev/ttyHS1               │
│       ──► /pendulum/state  (local DDS)               │
│  webots_relay.py ──► TCP client → 127.0.0.1:9001     │
└───────────────┬──────────────────────────────────────┘
                │ TCP over USB  (adb reverse tcp:9001)
┌───────────────▼──────────────────────────────────────┐
│         Windows — Webots R2025a + Python             │
│  TCP server :9001 ──► Digital Twin                   │
│  RL Training (Stable Baselines3) ──► policy.h        │
└──────────────────────────────────────────────────────┘

Transport. The QRB2210 runs native ROS 2, so webots_relay.py runs on the board, subscribes to /pendulum/state over local DDS, and pushes the twin packet to Webots over a TCP-over-USB tunnel (adb reverse). No network IP, no WSL2. The earlier WiFi + WSL2 relay path is deprecated — see docs/deprecated_wifi_wsl2.md. WSL2 + the FastDDS TCP profiles remain only as an optional RViz2/visualisation path.

ROS2 runs natively on the QRB2210. The STM32 sends lightweight binary frames over UART; a Python node on the QRB2210 parses them and publishes topics. Details: docs/serial_communication.md

Zephyr threads

Thread	Priority	Stack	Role
`thread_sensors`	2	2048 B	Read MPU6050, apply complementary filter
`thread_control`	1	2048 B	PID / RL policy → compute step frequency
`thread_comms`	3	2048 B	Send state as binary UART frames to QRB2210

Repository Layout

firmware/
└── zephyr/
    ├── blinky/              # hardware verification sample
    └── reaction_wheel/      # main application
        ├── CMakeLists.txt
        ├── prj.conf
        ├── boards/
        │   └── arduino_uno_q.conf
        └── src/
            ├── app.h        # shared types, message queue externs
            ├── main.c       # queue definitions, main()
            ├── sensors.c/h  # IMU read + filter
            ├── control.c/h  # PID / policy inference
            └── comms.c/h    # binary UART frames to QRB2210
docs/
├── arduino_uno_q_zephyr.md  # board setup, flashing, peripherals
├── qrb2210_ros2_setup.md    # ROS2 install + startup procedure
├── serial_communication.md  # arduino-router investigation, RTT solution
└── deprecated_wifi_wsl2.md  # old WiFi + WSL2 relay architecture (reference)
ros2_ws/
├── serial_bridge_node.py    # reads /dev/ttyHS1, publishes /pendulum/state
├── webots_relay.py          # (on board) /pendulum/state → TCP-over-USB to Webots
├── wsl2_relay.py            # DEPRECATED (WiFi + WSL2 + UDP)
├── fastdds_tcp_server.xml   # DEPRECATED (FastDDS TCP, WiFi era)
└── fastdds_tcp_client.xml   # DEPRECATED (FastDDS TCP, WiFi era)
scripts/
├── flash.ps1                # Ctrl+Shift+B flash via ADB + GDB
└── rtt_monitor.ps1          # RTT console (printk output over SWD)
simulation/
└── webots/
    ├── worlds/pendulum.wbt                  # 2-wheel free-standing pendulum on a table
    └── controllers/pendulum_controller/     # SIMULATION (physics) + DIGITAL_TWIN (display)
training/
└── rl/                      # Stable Baselines3 PPO training

Environment Setup

Prerequisites

Tool	Version	Notes
nRF Connect for VS Code	v3.3.0	includes Zephyr SDK, west
ADB Platform Tools	latest	`C:\tools\platform-tools\` in PATH
WSL2 Ubuntu 24.04	—	ROS2 Jazzy + micro_ros_agent
Webots	R2025a	Windows native
Python	3.11	stable-baselines3, gymnasium

Flashing

Flashing uses QRB2210 (Linux SoC on the board) as a debug bridge — no ST-LINK needed.

# Ctrl+Shift+B in VS Code, or manually:
.\scripts\flash.ps1

See docs/arduino_uno_q_zephyr.md for full setup details.

RTT Console (printk output)

Arduino UNO Q routes the STM32 UART through arduino-router, a Linux daemon that speaks MessagePack RPC — not raw serial. PuTTY, PowerShell SerialPort, and Arduino IDE Serial Monitor all show nothing because they expect raw bytes.

The correct solution is Segger RTT over the SWD debug connection (same path used for flashing):

# Runs in VS Code Terminal — "RTT Monitor" task (Ctrl+Shift+B → pick task)
.\scripts\rtt_monitor.ps1

The script kills any stale openocd, starts arduino-debug on the board, configures RTT via OpenOCD telnet (port 4444), and streams printk output from port 9090.

Required prj.conf entries:

CONFIG_USE_SEGGER_RTT=y
CONFIG_RTT_CONSOLE=y
CONFIG_UART_CONSOLE=n   # must disable UART, otherwise it takes priority and RTT gets 0 bytes

See docs/serial_communication.md for the full investigation.

Build

Open firmware/zephyr/reaction_wheel in nRF Connect for VS Code, add build configuration for arduino_uno_q/stm32u585xx, then Build + Flash.

Progress

Environment

nRF Connect SDK v3.3.0 + Zephyr 4.3.99 installed
hal_stm32 module cloned (C:\ncs\v3.3.0\modules\hal\stm32, rev e05bb47)
ADB flash procedure working (QRB2210 as debug bridge)
VS Code Ctrl+Shift+B flash task configured
RTT console working (scripts/rtt_monitor.ps1)
WSL2 Ubuntu 24.04 installed, ROS2 Jazzy installed

Phase 1 — MCU Architecture

reaction_wheel Zephyr project scaffolding
Three-thread architecture with message queues
PID controller skeleton (control.c)
Mock sinusoidal sensor data (sensors.c)
printk telemetry (comms.c)
Build passes on arduino_uno_q/stm32u585xx
Flash and verify serial output on real hardware (RTT: angle=X.XX vel=Y.YY at 100 Hz confirmed)

Phase 2 — ROS2 Bridge + Digital Twin

Define binary UART frame format (packed floats, magic header, CRC)
thread_comms sends binary frames over LPUART1 → /dev/ttyHS1 (replace printk)
serial_bridge_node.py on QRB2210: reads /dev/ttyHS1, publishes sensor_msgs/Imu
Webots world: 2-wheel free-standing pendulum on a silicone tip (genuine physics)
Webots controller: DIGITAL_TWIN (kinematic display) + SIMULATION (free physics) modes
~~WiFi + WSL2 relay + FastDDS TCP~~ → migrated to TCP-over-USB, relay on board (webots_relay.py)
End-to-end test: STM32 fake sine → serial_bridge_node → webots_relay → USB → Webots twin tilts
Extend frame/relay to carry roll + both wheel speeds (full 2-axis twin)

Phase 3 — Reinforcement Learning (PPO in sim)

Custom gymnasium.Env wrapping a Webots extern controller (training/rl/pendulum_env.py)
PPO training script with curriculum A→B→C (training/rl/train.py)
- A: nominal domain, low IMU noise — learn sensor↔control mapping
- B: full MPU6050 noise (imu_noise.py)
- C: domain randomization (CoM offset, mass, inertia, torque, friction, IMU, latency)
Tiny MLP actor (6→64→64→2 tanh), fixed obs scales, no BatchNorm/running-norm
Both tilt axes trained together (no single-axis stage)
Policy export: actor → TFLite int8 → policy.h (CMSIS-NN on STM32 M33)

Phase 4 — Real Hardware

MPU6050 wired + DTS overlay for i2c4
sensors.c — replace mock with real MPU6050 reads
A4988 wired (STEP/DIR GPIO, motor power supply)
control.c — Zephyr timer callback drives STEP pin
Mechanical assembly: arm, wheel, pivot

Phase 5 — Edge AI (policy on STM32, hard-realtime)

policy.h linked into Zephyr build
thread_control uses policy_infer() (CMSIS-NN int8) instead of pid()
Inference runs on STM32U585 M33 inside the control loop — no bridge latency
Pendulum balances on real hardware with the sim-trained PPO policy

Phase 6 — Offline fine-tuning (SAC on logged real data)

STM32 runs the deployed PPO actor at full rate and logs transitions (obs, action, reward, next-obs) up to the QRB2210
Collect the logs into a dataset (real motors / real mass distribution)
Train SAC offline / batch against the dataset — off the robot, no online trainer in the loop
Keep a conservative / behavior-cloning term (offline RL — avoid actions the data never covers)
Re-quantize and re-flash the fine-tuned policy.h; closes the sim-to-real gap without manual PID retuning

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vscode		.vscode
docs		docs
firmware/zephyr		firmware/zephyr
ros2_ws		ros2_ws
scripts		scripts
simulation/webots		simulation/webots
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PROJECT IN PROGRESS!

Reaction Wheel Pendulum Balancer

Hardware

Software Architecture

Zephyr threads

Repository Layout

Environment Setup

Prerequisites

Flashing

RTT Console (printk output)

Build

Progress

Environment

Phase 1 — MCU Architecture

Phase 2 — ROS2 Bridge + Digital Twin

Phase 3 — Reinforcement Learning (PPO in sim)

Phase 4 — Real Hardware

Phase 5 — Edge AI (policy on STM32, hard-realtime)

Phase 6 — Offline fine-tuning (SAC on logged real data)

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PROJECT IN PROGRESS!

Reaction Wheel Pendulum Balancer

Hardware

Software Architecture

Zephyr threads

Repository Layout

Environment Setup

Prerequisites

Flashing

RTT Console (printk output)

Build

Progress

Environment

Phase 1 — MCU Architecture

Phase 2 — ROS2 Bridge + Digital Twin

Phase 3 — Reinforcement Learning (PPO in sim)

Phase 4 — Real Hardware

Phase 5 — Edge AI (policy on STM32, hard-realtime)

Phase 6 — Offline fine-tuning (SAC on logged real data)

Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages