Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,10 @@ add_executable(gpu_megaparticles_6dof src/gpu_megaparticles_6dof.cu)
target_link_libraries(gpu_megaparticles_6dof ${OpenCV_LIBS})
target_compile_options(gpu_megaparticles_6dof PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:--expt-relaxed-constexpr>)

add_executable(gpu_megaparticles_gicp_mcl src/gpu_megaparticles_gicp_mcl.cu)
target_link_libraries(gpu_megaparticles_gicp_mcl ${OpenCV_LIBS})
target_compile_options(gpu_megaparticles_gicp_mcl PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:--expt-relaxed-constexpr>)

add_executable(gpu_kld_amcl src/gpu_kld_amcl.cu)
target_link_libraries(gpu_kld_amcl ${OpenCV_LIBS})
target_compile_options(gpu_kld_amcl PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:--expt-relaxed-constexpr>)
Expand Down
48 changes: 46 additions & 2 deletions plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,52 @@ known sharp edges and lessons learned from the last few attempts, and (5)
a prioritised menu of candidate next tasks with enough specificity that a
fresh agent can pick one and start.

Branch `chore/shared-se3-helpers` (-> `master`) is **in flight** (2026-05-26).
This is the Open Threads A shared-header cleanup recommended after #105: it
PR for `feat/gpu-megaparticles-gicp-d2d` -> `master` is **IN FLIGHT** (opened
2026-05-26): a GICP-style distribution-to-distribution (D2D) scan likelihood
for the MegaParticles line -- the "GICP-like point-cloud likelihood" follow-up
flagged after #86/#101/#104. New single file `src/gpu_megaparticles_gicp_mcl.cu`
runs a controlled head-to-head: two 1,048,576-particle filters with IDENTICAL
MegaParticles machinery (global uniform init, Gauss-Newton particle motion,
sparse bucket-neighbor Stein attraction/repulsion, posterior smoothing,
representative-state gate, hidden kidnap + 15-frame scan blackout), differing
ONLY in the per-particle scoring kernel. Arm A is the #86 distance-field
endpoint proxy (control; it reproduces #86's ~0.097 m post-kidnap RMSE). Arm B
is the new GICP D2D likelihood: the map is a point cloud (2,396 points, boundary
cells thinned to ~0.15 m) with per-point disk covariances (small variance along
the surface normal, large along the tangent), indexed by a uniform NN grid; each
particle matches every scan endpoint to the nearest map point and scores the
surface-aware Gaussian log-likelihood under the combined covariance
M = (C_map + R C_scan R^T)^{-1} (Segal et al. RSS 2009), summed over the scan,
with a per-particle full 3x3 Gauss-Newton step driving the Stein motion. Key
robustness lesson: a PURE D2D likelihood (flat penalty + zero gradient outside
the match radius) re-localized the hidden kidnap only intermittently -- the
sharp D2D contracts harder pre-kidnap, leaving thin global support, and a lost
particle gets no gradient pull. Fix is a coarse-to-fine design: an UNMATCHED ray
falls back to the distance-field endpoint log-likelihood (smooth long-range
pull), so the worst case == the field filter and global recovery is robust,
while MATCHED rays use the sharp surface-aware GICP term for accuracy. Verified
over 4 runs (GPU atomicAdd noise floor): both arms recover the kidnap in 0
frames; post-kidnap RMSE field 0.099 m -> GICP D2D ~0.064 m (~35% lower), final
error 0.040 m -> 0.021 m (~halved), at ~2.4x per-step cost (field ~4.9 ms ->
D2D ~12.1 ms). Touched files: CMakeLists.txt, readme.md, plan.md,
src/gpu_megaparticles_gicp_mcl.cu, plus the gh-pages GIF. With this the
MegaParticles localization line now covers Stein (#86), explicit LSH (#101),
6-DoF SE(3) (#104), and the GICP D2D likelihood.

PR #112 (`chore/shared-se3-helpers` -> `master`) was **MERGED** (squash) on
2026-05-26 at `3eb4b4d`; CI Build passed (12m1s; Build + Python tests + CPU
tests all green; only the Node.js 20 deprecation annotation), the draft was
marked ready, and the remote branch was deleted. The concurrent agent session
landed six GPU graph-neural / game-theoretic MPPI demos
(`gpu_best_response_graph_mppi`, `gpu_belief_risk_graph_mppi`,
`gpu_intent_graph_neural_mppi`, `gpu_priority_graph_neural_mppi`,
`gpu_interaction_graph_neural_mppi`, `gpu_multiagent_graph_neural_mppi`) on
master just before it, so the squash fast-forward landed all of them together;
#112's own diff is exactly its five files (include/se3_helpers.cuh, plan.md,
and the three migrated `.cu`). Local `master` is at `3eb4b4d` and in sync with
origin. There is **no active feature branch** now -- the next agent starts
fresh from `master`.
This was the Open Threads A shared-header cleanup recommended after #105: it
lifts the SE(3) / SO(3) math kernels + the 6x6 SPD Cholesky solve that were
copied verbatim across the three rotation-matrix pose-graph SLAM back-ends
(`gpu_pose_graph_slam_3d.cu`, `gpu_pose_graph_slam_3d_switchable.cu`,
Expand Down
7 changes: 4 additions & 3 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ Same algorithm on CPU and GPU — GPU enables orders of magnitude more particles
| <img src="https://rsasaki0109.github.io/CudaRobotics/comparison_reeds_shepp_fan.gif" width="400"/> | <img src="https://rsasaki0109.github.io/CudaRobotics/gpu_kld_amcl.gif" width="400"/> |
| **MegaParticles + explicit p-stable LSH neighbor index: fixed grid 58% vs LSH 88% neighbor recall (1M particles)** | **MegaParticles 6-DoF: 1M SE(3) particles, hidden-kidnap relocalization (LSH neighbor consensus)** |
| <img src="https://rsasaki0109.github.io/CudaRobotics/gpu_megaparticles_lsh.gif" width="400"/> | <img src="https://rsasaki0109.github.io/CudaRobotics/gpu_megaparticles_6dof.gif" width="400"/> |
| **Online 3D SLAM with switchable loop constraints: false loops rejected live (plain 9.10 m vs switchable 0.29 m, 21/21 rejected)** | |
| <img src="https://rsasaki0109.github.io/CudaRobotics/gpu_online_slam_3d_switchable.gif" width="400"/> | |
| **Online 3D SLAM with switchable loop constraints: false loops rejected live (plain 9.10 m vs switchable 0.29 m, 21/21 rejected)** | **MegaParticles GICP D2D likelihood: surface-aware scoring halves error vs field proxy (1M particles)** |
| <img src="https://rsasaki0109.github.io/CudaRobotics/gpu_online_slam_3d_switchable.gif" width="400"/> | <img src="https://rsasaki0109.github.io/CudaRobotics/gpu_megaparticles_gicp_mcl.gif" width="400"/> |

## Capability matrix

Expand All @@ -31,7 +31,7 @@ Same algorithm on CPU and GPU — GPU enables orders of magnitude more particles
| Collision check | `comparison_collision_check` | 1M segments/scan | 1,277x per candidate |
| Scan matching | `comparison_icp`, `comparison_ndt`, `gpu_ndt_3d_multires`, `gicp` | 10K+ points | parallel correspondences |
| Pose-graph SLAM | `gpu_pose_graph_slam`, `gpu_pose_graph_slam_3d`, `gpu_pose_graph_slam_3d_robust`, `gpu_pose_graph_slam_3d_switchable`, `gpu_online_slam`, `gpu_online_slam_3d_switchable` | 2D 200 poses / 3D 384-420 poses | robust 3D rejects 36/36 false loops, 6.95→0.28 m; switchable constraints learn per-loop switches jointly with poses, 6.95→0.29 m; online 3D switchable rejects false loops live in a sliding window, plain 9.10 m → switchable 0.29 m (21/21 rejected) |
| Particle filter | `comparison_pf`, `gpu_global_localization_mcl`, `gpu_megaparticles_stein_mcl`, `gpu_megaparticles_lsh`, `gpu_megaparticles_6dof`, `gpu_kld_amcl`, `diff_pf`, `diff_pf_mlp` | 10K-1M particles | MegaParticles-style range SPF: 14.61 m bootstrap vs 0.097 m recovery; explicit p-stable LSH neighbor index lifts neighbor recall 58%→88%; 6-DoF SE(3) relocalization recovers a hidden kidnap to 0.22 m / 1.9 deg (LSH neighbor consensus); KLD-AMCL adapts 400→65,536 particles, 15.2x vs CPU |
| Particle filter | `comparison_pf`, `gpu_global_localization_mcl`, `gpu_megaparticles_stein_mcl`, `gpu_megaparticles_lsh`, `gpu_megaparticles_6dof`, `gpu_megaparticles_gicp_mcl`, `gpu_kld_amcl`, `diff_pf`, `diff_pf_mlp` | 10K-1M particles | MegaParticles-style range SPF: 14.61 m bootstrap vs 0.097 m recovery; explicit p-stable LSH neighbor index lifts neighbor recall 58%→88%; 6-DoF SE(3) relocalization recovers a hidden kidnap to 0.22 m / 1.9 deg (LSH neighbor consensus); surface-aware GICP D2D likelihood halves post-kidnap error vs the field proxy (0.099→0.064 m); KLD-AMCL adapts 400→65,536 particles, 15.2x vs CPU |
| RRT family | `comparison_rrt*`, `comparison_rrtstar_rewire` | 1M paths / 200K nodes | 5,000x per-path; 62x rewire |
| Crowd / swarm | `gpu_crowd_swarm` | 10,000 boids with uniform-grid neighbours | 105x vs CPU |
| Graph policy control | `gpu_gnn_swarm_controller`, `gpu_gat_traversability_policy` | 2048 agents / 3072 terrain nodes x 3 heads | 2.88 ms/control; 99.4x GAT policy |
Expand Down Expand Up @@ -191,6 +191,7 @@ cd ros2_ws && colcon build --packages-select cuda_robotics
| MegaParticles-style Stein MCL | 1,048,576 range particles; local bootstrap post RMSE 14.61 m → Stein/bucket posterior recovery 0.097 m |
| MegaParticles LSH neighbor index | 2 × 1,048,576 particles; explicit p-stable LSH (8 tables × 3 projections) vs fixed grid; neighbor recall vs brute-force kNN 58.2% → 87.8%, post-kidnap RMSE 0.099 → 0.088 m |
| MegaParticles 6-DoF SE(3) | 1,048,576 SE(3) particles in a 3D voxel world; 3D-ESDF range likelihood, quaternion GN steps, 6-D p-stable LSH neighbor consensus; hidden kidnap: local bootstrap post RMSE 5.97 m → 6-DoF MegaParticles 0.22 m / 1.9 deg, reacquires in 0 frames |
| MegaParticles GICP D2D likelihood | 2 × 1,048,576 particles, identical Stein machinery; surface-aware GICP distribution-to-distribution scoring (per-point disk covariances, grid-indexed map cloud) vs the distance-field proxy; both recover the hidden kidnap in 0 frames, post-kidnap RMSE 0.099 → 0.064 m and final error 0.040 → 0.021 m, at ~2.4x per-step cost (4.9 → 12.1 ms) |
| Augmented KLD-AMCL | KLD-sampling adapts 400→65,536 particles, augmented injection reacquires hidden kidnap in 13 steps, settled RMSE 0.014 m, **15.2x** vs CPU |
| 2D ESDF (640K cells) | **53,404x** per cell (JFA) |
| 3D ESDF (1M voxels) | **86,613x** per voxel (JFA-3D) |
Expand Down
Loading
Loading