Skip to content

Siedmiu/Flywheel-Balancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PROJECT IN PROGRESS!

Reaction Wheel Pendulum Balancer

2D inverted pendulum balanced by 2 reaction wheels. The controller learns via Reinforcement Learning in simulation and runs the trained policy on-device. ROS2 and ZephyrOS running on Arduino UNO Q.

The hypothesis is that a machine-learning trained in simulation with randomized parameters will generalize and adapt itself to the real-world system despite differences in motors and mass distribution, whereas a PID controller would require manual retuning.

Hardware

Component Part
MCU board Arduino UNO Q (STM32U585 @ 160 MHz + QRB2210 Linux SBC)
IMU MPU6050
Motor NEMA 17 stepper (17PM-K502-G6WM)
Driver A4988
Power 12–24 V for motor, 3.3 V logic from board

Steppers are bad for this but this is what I had on hand. Use BLCD with encoders if you can.

Software Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 Zephyr RTOS (STM32U585)             β”‚
β”‚                                                     β”‚
β”‚  thread_sensors ──► k_msgq ──► thread_control       β”‚
β”‚       β”‚                              β”‚              β”‚
β”‚  MPU6050 (I2C)               STEP/DIR GPIO          β”‚
β”‚  Complementary filter        (A4988 microstepping)  β”‚
β”‚                                      β”‚              β”‚
β”‚  thread_comms ◄──── k_msgq β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚       β”‚                                             β”‚
β”‚  simple binary UART frames                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚ LPUART1 β†’ /dev/ttyHS1
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         QRB2210 Linux SBC (on-board) β€” ROS 2 Jazzy   β”‚
β”‚  serial_bridge_node  reads /dev/ttyHS1               β”‚
β”‚       ──► /pendulum/state  (local DDS)               β”‚
β”‚  webots_relay.py ──► TCP client β†’ 127.0.0.1:9001     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚ TCP over USB  (adb reverse tcp:9001)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Windows β€” Webots R2025a + Python             β”‚
β”‚  TCP server :9001 ──► Digital Twin                   β”‚
β”‚  RL Training (Stable Baselines3) ──► policy.h        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Transport. The QRB2210 runs native ROS 2, so webots_relay.py runs on the board, subscribes to /pendulum/state over local DDS, and pushes the twin packet to Webots over a TCP-over-USB tunnel (adb reverse). No network IP, no WSL2. The earlier WiFi + WSL2 relay path is deprecated β€” see docs/deprecated_wifi_wsl2.md. WSL2 + the FastDDS TCP profiles remain only as an optional RViz2/visualisation path.

ROS2 runs natively on the QRB2210. The STM32 sends lightweight binary frames over UART; a Python node on the QRB2210 parses them and publishes topics. Details: docs/serial_communication.md

Zephyr threads

Thread Priority Stack Role
thread_sensors 2 2048 B Read MPU6050, apply complementary filter
thread_control 1 2048 B PID / RL policy β†’ compute step frequency
thread_comms 3 2048 B Send state as binary UART frames to QRB2210

Repository Layout

firmware/
└── zephyr/
    β”œβ”€β”€ blinky/              # hardware verification sample
    └── reaction_wheel/      # main application
        β”œβ”€β”€ CMakeLists.txt
        β”œβ”€β”€ prj.conf
        β”œβ”€β”€ boards/
        β”‚   └── arduino_uno_q.conf
        └── src/
            β”œβ”€β”€ app.h        # shared types, message queue externs
            β”œβ”€β”€ main.c       # queue definitions, main()
            β”œβ”€β”€ sensors.c/h  # IMU read + filter
            β”œβ”€β”€ control.c/h  # PID / policy inference
            └── comms.c/h    # binary UART frames to QRB2210
docs/
β”œβ”€β”€ arduino_uno_q_zephyr.md  # board setup, flashing, peripherals
β”œβ”€β”€ qrb2210_ros2_setup.md    # ROS2 install + startup procedure
β”œβ”€β”€ serial_communication.md  # arduino-router investigation, RTT solution
└── deprecated_wifi_wsl2.md  # old WiFi + WSL2 relay architecture (reference)
ros2_ws/
β”œβ”€β”€ serial_bridge_node.py    # reads /dev/ttyHS1, publishes /pendulum/state
β”œβ”€β”€ webots_relay.py          # (on board) /pendulum/state β†’ TCP-over-USB to Webots
β”œβ”€β”€ wsl2_relay.py            # DEPRECATED (WiFi + WSL2 + UDP)
β”œβ”€β”€ fastdds_tcp_server.xml   # DEPRECATED (FastDDS TCP, WiFi era)
└── fastdds_tcp_client.xml   # DEPRECATED (FastDDS TCP, WiFi era)
scripts/
β”œβ”€β”€ flash.ps1                # Ctrl+Shift+B flash via ADB + GDB
└── rtt_monitor.ps1          # RTT console (printk output over SWD)
simulation/
└── webots/
    β”œβ”€β”€ worlds/pendulum.wbt                  # 2-wheel free-standing pendulum on a table
    └── controllers/pendulum_controller/     # SIMULATION (physics) + DIGITAL_TWIN (display)
training/
└── rl/                      # Stable Baselines3 PPO training

Environment Setup

Prerequisites

Tool Version Notes
nRF Connect for VS Code v3.3.0 includes Zephyr SDK, west
ADB Platform Tools latest C:\tools\platform-tools\ in PATH
WSL2 Ubuntu 24.04 β€” ROS2 Jazzy + micro_ros_agent
Webots R2025a Windows native
Python 3.11 stable-baselines3, gymnasium

Flashing

Flashing uses QRB2210 (Linux SoC on the board) as a debug bridge β€” no ST-LINK needed.

# Ctrl+Shift+B in VS Code, or manually:
.\scripts\flash.ps1

See docs/arduino_uno_q_zephyr.md for full setup details.

RTT Console (printk output)

Arduino UNO Q routes the STM32 UART through arduino-router, a Linux daemon that speaks MessagePack RPC β€” not raw serial. PuTTY, PowerShell SerialPort, and Arduino IDE Serial Monitor all show nothing because they expect raw bytes.

The correct solution is Segger RTT over the SWD debug connection (same path used for flashing):

# Runs in VS Code Terminal β€” "RTT Monitor" task (Ctrl+Shift+B β†’ pick task)
.\scripts\rtt_monitor.ps1

The script kills any stale openocd, starts arduino-debug on the board, configures RTT via OpenOCD telnet (port 4444), and streams printk output from port 9090.

Required prj.conf entries:

CONFIG_USE_SEGGER_RTT=y
CONFIG_RTT_CONSOLE=y
CONFIG_UART_CONSOLE=n   # must disable UART, otherwise it takes priority and RTT gets 0 bytes

See docs/serial_communication.md for the full investigation.

Build

Open firmware/zephyr/reaction_wheel in nRF Connect for VS Code, add build configuration for arduino_uno_q/stm32u585xx, then Build + Flash.

Progress

Environment

  • nRF Connect SDK v3.3.0 + Zephyr 4.3.99 installed
  • hal_stm32 module cloned (C:\ncs\v3.3.0\modules\hal\stm32, rev e05bb47)
  • ADB flash procedure working (QRB2210 as debug bridge)
  • VS Code Ctrl+Shift+B flash task configured
  • RTT console working (scripts/rtt_monitor.ps1)
  • WSL2 Ubuntu 24.04 installed, ROS2 Jazzy installed

Phase 1 β€” MCU Architecture

  • reaction_wheel Zephyr project scaffolding
  • Three-thread architecture with message queues
  • PID controller skeleton (control.c)
  • Mock sinusoidal sensor data (sensors.c)
  • printk telemetry (comms.c)
  • Build passes on arduino_uno_q/stm32u585xx
  • Flash and verify serial output on real hardware (RTT: angle=X.XX vel=Y.YY at 100 Hz confirmed)

Phase 2 β€” ROS2 Bridge + Digital Twin

  • Define binary UART frame format (packed floats, magic header, CRC)
  • thread_comms sends binary frames over LPUART1 β†’ /dev/ttyHS1 (replace printk)
  • serial_bridge_node.py on QRB2210: reads /dev/ttyHS1, publishes sensor_msgs/Imu
  • Webots world: 2-wheel free-standing pendulum on a silicone tip (genuine physics)
  • Webots controller: DIGITAL_TWIN (kinematic display) + SIMULATION (free physics) modes
  • WiFi + WSL2 relay + FastDDS TCP β†’ migrated to TCP-over-USB, relay on board (webots_relay.py)
  • End-to-end test: STM32 fake sine β†’ serial_bridge_node β†’ webots_relay β†’ USB β†’ Webots twin tilts
  • Extend frame/relay to carry roll + both wheel speeds (full 2-axis twin)

Phase 3 β€” Reinforcement Learning (PPO in sim)

  • Custom gymnasium.Env wrapping a Webots extern controller (training/rl/pendulum_env.py)
  • PPO training script with curriculum Aβ†’Bβ†’C (training/rl/train.py)
    • A: nominal domain, low IMU noise β€” learn sensor↔control mapping
    • B: full MPU6050 noise (imu_noise.py)
    • C: domain randomization (CoM offset, mass, inertia, torque, friction, IMU, latency)
  • Tiny MLP actor (6β†’64β†’64β†’2 tanh), fixed obs scales, no BatchNorm/running-norm
  • Both tilt axes trained together (no single-axis stage)
  • Policy export: actor β†’ TFLite int8 β†’ policy.h (CMSIS-NN on STM32 M33)

Phase 4 β€” Real Hardware

  • MPU6050 wired + DTS overlay for i2c4
  • sensors.c β€” replace mock with real MPU6050 reads
  • A4988 wired (STEP/DIR GPIO, motor power supply)
  • control.c β€” Zephyr timer callback drives STEP pin
  • Mechanical assembly: arm, wheel, pivot

Phase 5 β€” Edge AI (policy on STM32, hard-realtime)

  • policy.h linked into Zephyr build
  • thread_control uses policy_infer() (CMSIS-NN int8) instead of pid()
  • Inference runs on STM32U585 M33 inside the control loop β€” no bridge latency
  • Pendulum balances on real hardware with the sim-trained PPO policy

Phase 6 β€” Offline fine-tuning (SAC on logged real data)

  • STM32 runs the deployed PPO actor at full rate and logs transitions (obs, action, reward, next-obs) up to the QRB2210
  • Collect the logs into a dataset (real motors / real mass distribution)
  • Train SAC offline / batch against the dataset β€” off the robot, no online trainer in the loop
  • Keep a conservative / behavior-cloning term (offline RL β€” avoid actions the data never covers)
  • Re-quantize and re-flash the fine-tuned policy.h; closes the sim-to-real gap without manual PID retuning

Links

About

2-axis inverted pendulum balanced by two reaction wheels. RL policy trained in Webots, runs on-device. Zephyr RTOS on STM32 + native ROS 2 on the QRB2210 SBC (Arduino UNO Q), digital twin over USB.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors