Skip to content

Add GPU MegaParticles GICP distribution-to-distribution likelihood#115

Merged
rsasaki0109 merged 1 commit into
masterfrom
feat/gpu-megaparticles-gicp-d2d
May 26, 2026
Merged

Add GPU MegaParticles GICP distribution-to-distribution likelihood#115
rsasaki0109 merged 1 commit into
masterfrom
feat/gpu-megaparticles-gicp-d2d

Conversation

@rsasaki0109
Copy link
Copy Markdown
Owner

Summary

Adds a GICP-style distribution-to-distribution (D2D) scan likelihood to the
MegaParticles localization line — the "GICP-like point-cloud likelihood"
follow-up to the Stein (#86), explicit-LSH (#101) and 6-DoF SE(3) (#104) demos.
The earlier demos score a range scan against a precomputed distance field:
every endpoint is penalised by its isotropic distance to the nearest wall. That
is cheap but blurs surface structure — a point sliding along a wall is
penalised as hard as a point moving into it.

Here the map is instead a point cloud with per-point surface-aware
covariances
(the GICP "disk": small variance along the surface normal, large
along the tangent). Each particle transforms its scan into the world, matches
every endpoint to the nearest map point through a uniform grid index, and scores
the Mahalanobis residual under the combined covariance
M = (C_map + R C_scan Rᵀ)⁻¹ (Segal et al., Generalized-ICP, RSS 2009),
summed over the scan. This is point-to-line / distribution-to-distribution: the
cost barely grows for tangential slip but rises sharply for normal-direction
error — the correct probabilistic weight near walls.

To isolate the likelihood this is a controlled head-to-head: two filters,
each 1,048,576 particles, sharing the identical MegaParticles machinery
(global uniform init, Gauss-Newton particle motion, sparse bucket-neighbor Stein
attraction/repulsion, posterior smoothing, representative-state gate, hidden
kidnap + 15-frame scan blackout recovery). Only the per-particle scoring kernel
differs:

  • Arm A — field proxy (control): the Add MegaParticles-style Stein MCL demo #86 distance-field endpoint likelihood.
  • Arm B — GICP D2D (new): surface-aware Mahalanobis scoring against the map
    cloud, with a per-particle full 3×3 Gauss-Newton step driving the Stein motion.

The D2D likelihood is evaluated for one million particles by indexing the map
cloud (2,396 points) with a uniform grid, so each endpoint's nearest-neighbour
lookup only touches a 3×3 cell neighborhood.

Coarse-to-fine for robust recovery. A pure D2D likelihood (flat penalty +
zero gradient outside the match radius) re-localized the hidden kidnap only
intermittently: the sharper D2D contracts harder before the kidnap, leaving thin
global support, and a lost particle gets no gradient pull. So an unmatched ray
falls back to the distance-field endpoint log-likelihood
(smooth long-range
pull) — the worst case becomes exactly the field filter, keeping global recovery
robust — while matched rays use the sharp surface-aware GICP term where the
accuracy gain comes from.

Results (SE(2), 2 × 1,048,576 particles, identical machinery, hidden kidnap)

metric field proxy (#86 likelihood) GICP D2D (new)
post-kidnap RMSE 0.099 m 0.064 m
final pose error 0.040 m 0.021 m
reacquisition after blackout 0 frames 0 frames
avg GPU step 4.9 ms 12.1 ms
map representation 340×240 distance field 2,396-point cloud + per-point disk cov

The field-proxy arm reproduces the original Stein MCL demo's ~0.097 m post-kidnap
RMSE (control validated). Numbers are stable across 4 runs to the GPU
atomicAdd-order noise floor (D2D post-kidnap RMSE 0.0639–0.0645 m). The
surface-aware likelihood roughly halves the steady-state error while keeping
the same robust 0-frame kidnap recovery, at ~2.4× per-step cost for the
grid-indexed nearest-neighbour search.

demo

Test plan

  • cmake .. && make gpu_megaparticles_gicp_mcl -j builds clean (CUDA C++14, --expt-relaxed-constexpr).
  • Runs end to end, writes gif/gpu_megaparticles_gicp_mcl.gif (760×215, 1.6 MB).
  • Field-proxy arm reproduces the Add MegaParticles-style Stein MCL demo #86 distance-field result (~0.097 m post-kidnap RMSE).
  • GICP D2D arm recovers the hidden kidnap in 0 frames and beats the field proxy on post-kidnap RMSE and final error, reproducibly across 4 runs.
  • GIF deployed to gh-pages root (HTTP 200).

Notes

  • One demo = one .cu file; reuses include/cuda_check.cuh and include/cuda_video.h.
  • SE(2) demo, deliberately scoped: the map cloud, per-point disk covariances,
    combined-covariance Mahalanobis scoring and per-particle Gauss-Newton are the
    GICP D2D substance; the distance-field fallback is what makes it robust enough
    for global kidnap recovery at this particle count.

New src/gpu_megaparticles_gicp_mcl.cu runs a controlled head-to-head of two
1,048,576-particle MegaParticles filters that share identical machinery
(global uniform init, Gauss-Newton particle motion, sparse bucket-neighbor
Stein attraction/repulsion, posterior smoothing, representative-state gate,
hidden kidnap + scan blackout recovery) and differ only in the per-particle
scan-scoring kernel.

Arm A is the distance-field endpoint proxy of the original Stein MCL demo
(control). Arm B is a GICP-style distribution-to-distribution likelihood: the
map is a point cloud with per-point disk covariances (small variance along the
surface normal, large along the tangent), indexed by a uniform NN grid; each
particle matches every scan endpoint to the nearest map point and scores the
surface-aware Gaussian log-likelihood under the combined covariance
M = (C_map + R C_scan R^T)^{-1} (Segal et al., RSS 2009), with a per-particle
3x3 Gauss-Newton step driving the Stein motion. Unmatched rays fall back to the
distance-field log-likelihood, giving a smooth long-range gradient so a
globally lost particle is still pulled toward structure (robust kidnap
recovery); matched rays use the sharp surface-aware term for accuracy.

Both arms recover the hidden kidnap in 0 frames; the surface-aware GICP D2D
likelihood lowers post-kidnap RMSE 0.099 m -> 0.064 m and final error
0.040 m -> 0.021 m versus the field proxy, at ~2.4x per-step cost
(4.9 ms -> 12.1 ms) for the grid-indexed nearest-neighbour search.
@rsasaki0109 rsasaki0109 marked this pull request as ready for review May 26, 2026 03:12
@rsasaki0109 rsasaki0109 merged commit 0318b9f into master May 26, 2026
1 check passed
@rsasaki0109 rsasaki0109 deleted the feat/gpu-megaparticles-gicp-d2d branch May 26, 2026 03:12
rsasaki0109 added a commit that referenced this pull request May 26, 2026
Replace the hand-tuned representative-state continuity gate carried by the
MegaParticles line (#86/#101/#104/#115) with a principled robust fixed-lag
smoother, reporting raw max-posterior vs smoothed pose error separately.

The GPU runs the expensive part exactly as #86 (1,048,576 particles,
distance-field likelihood, bucket-neighbor Stein motion, posterior smoothing)
and emits one raw max-posterior representative pose per frame.  A lightweight
host backend keeps a sliding window of the last 10 frames and jointly optimises
a smoothed pose chain by IRLS Gauss-Newton with switchable CV-motion factors
(a genuine kidnap discontinuity breaks the link instead of being smeared) and
Huber-robust measurement factors (one-frame spurious max-posterior spikes are
rejected).  A frame is finalized once it falls off the window head.

A robust smoother alone cannot distinguish a sustained new-location measurement
(kidnap) from an outlier, so it stuck to the coasted old trajectory after the
kidnap.  Fix: a data-driven reset that fires only when measurements resume far
from the coast after a measurement dropout, distinguishing a genuine
relocalization from the high-confidence spurious-mode flips during tracking
(those stay rejected as outliers).

Controlled comparison, 4 runs (GPU atomicAdd noise floor): in-track jitter
(mean |d2 pos|) raw 4.31 -> smoothed ~0.06 (~70x, truth 0.0055), in-track RMSE
raw 5.4 -> smoothed ~0.25 m, post-kidnap RMSE raw ~1.2-1.9 -> smoothed ~0.09 m,
recovers the hidden kidnap in 0 frames; host backend adds negligible cost
(GPU step ~5 ms).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant