Add GPU MegaParticles GICP distribution-to-distribution likelihood#115
Merged
Conversation
New src/gpu_megaparticles_gicp_mcl.cu runs a controlled head-to-head of two
1,048,576-particle MegaParticles filters that share identical machinery
(global uniform init, Gauss-Newton particle motion, sparse bucket-neighbor
Stein attraction/repulsion, posterior smoothing, representative-state gate,
hidden kidnap + scan blackout recovery) and differ only in the per-particle
scan-scoring kernel.
Arm A is the distance-field endpoint proxy of the original Stein MCL demo
(control). Arm B is a GICP-style distribution-to-distribution likelihood: the
map is a point cloud with per-point disk covariances (small variance along the
surface normal, large along the tangent), indexed by a uniform NN grid; each
particle matches every scan endpoint to the nearest map point and scores the
surface-aware Gaussian log-likelihood under the combined covariance
M = (C_map + R C_scan R^T)^{-1} (Segal et al., RSS 2009), with a per-particle
3x3 Gauss-Newton step driving the Stein motion. Unmatched rays fall back to the
distance-field log-likelihood, giving a smooth long-range gradient so a
globally lost particle is still pulled toward structure (robust kidnap
recovery); matched rays use the sharp surface-aware term for accuracy.
Both arms recover the hidden kidnap in 0 frames; the surface-aware GICP D2D
likelihood lowers post-kidnap RMSE 0.099 m -> 0.064 m and final error
0.040 m -> 0.021 m versus the field proxy, at ~2.4x per-step cost
(4.9 ms -> 12.1 ms) for the grid-indexed nearest-neighbour search.
5 tasks
rsasaki0109
added a commit
that referenced
this pull request
May 26, 2026
Replace the hand-tuned representative-state continuity gate carried by the MegaParticles line (#86/#101/#104/#115) with a principled robust fixed-lag smoother, reporting raw max-posterior vs smoothed pose error separately. The GPU runs the expensive part exactly as #86 (1,048,576 particles, distance-field likelihood, bucket-neighbor Stein motion, posterior smoothing) and emits one raw max-posterior representative pose per frame. A lightweight host backend keeps a sliding window of the last 10 frames and jointly optimises a smoothed pose chain by IRLS Gauss-Newton with switchable CV-motion factors (a genuine kidnap discontinuity breaks the link instead of being smeared) and Huber-robust measurement factors (one-frame spurious max-posterior spikes are rejected). A frame is finalized once it falls off the window head. A robust smoother alone cannot distinguish a sustained new-location measurement (kidnap) from an outlier, so it stuck to the coasted old trajectory after the kidnap. Fix: a data-driven reset that fires only when measurements resume far from the coast after a measurement dropout, distinguishing a genuine relocalization from the high-confidence spurious-mode flips during tracking (those stay rejected as outliers). Controlled comparison, 4 runs (GPU atomicAdd noise floor): in-track jitter (mean |d2 pos|) raw 4.31 -> smoothed ~0.06 (~70x, truth 0.0055), in-track RMSE raw 5.4 -> smoothed ~0.25 m, post-kidnap RMSE raw ~1.2-1.9 -> smoothed ~0.09 m, recovers the hidden kidnap in 0 frames; host backend adds negligible cost (GPU step ~5 ms).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a GICP-style distribution-to-distribution (D2D) scan likelihood to the
MegaParticles localization line — the "GICP-like point-cloud likelihood"
follow-up to the Stein (#86), explicit-LSH (#101) and 6-DoF SE(3) (#104) demos.
The earlier demos score a range scan against a precomputed distance field:
every endpoint is penalised by its isotropic distance to the nearest wall. That
is cheap but blurs surface structure — a point sliding along a wall is
penalised as hard as a point moving into it.
Here the map is instead a point cloud with per-point surface-aware
covariances (the GICP "disk": small variance along the surface normal, large
along the tangent). Each particle transforms its scan into the world, matches
every endpoint to the nearest map point through a uniform grid index, and scores
the Mahalanobis residual under the combined covariance
M = (C_map + R C_scan Rᵀ)⁻¹(Segal et al., Generalized-ICP, RSS 2009),summed over the scan. This is point-to-line / distribution-to-distribution: the
cost barely grows for tangential slip but rises sharply for normal-direction
error — the correct probabilistic weight near walls.
To isolate the likelihood this is a controlled head-to-head: two filters,
each 1,048,576 particles, sharing the identical MegaParticles machinery
(global uniform init, Gauss-Newton particle motion, sparse bucket-neighbor Stein
attraction/repulsion, posterior smoothing, representative-state gate, hidden
kidnap + 15-frame scan blackout recovery). Only the per-particle scoring kernel
differs:
cloud, with a per-particle full 3×3 Gauss-Newton step driving the Stein motion.
The D2D likelihood is evaluated for one million particles by indexing the map
cloud (2,396 points) with a uniform grid, so each endpoint's nearest-neighbour
lookup only touches a 3×3 cell neighborhood.
Coarse-to-fine for robust recovery. A pure D2D likelihood (flat penalty +
zero gradient outside the match radius) re-localized the hidden kidnap only
intermittently: the sharper D2D contracts harder before the kidnap, leaving thin
global support, and a lost particle gets no gradient pull. So an unmatched ray
falls back to the distance-field endpoint log-likelihood (smooth long-range
pull) — the worst case becomes exactly the field filter, keeping global recovery
robust — while matched rays use the sharp surface-aware GICP term where the
accuracy gain comes from.
Results (SE(2), 2 × 1,048,576 particles, identical machinery, hidden kidnap)
The field-proxy arm reproduces the original Stein MCL demo's ~0.097 m post-kidnap
RMSE (control validated). Numbers are stable across 4 runs to the GPU
atomicAdd-order noise floor (D2D post-kidnap RMSE 0.0639–0.0645 m). Thesurface-aware likelihood roughly halves the steady-state error while keeping
the same robust 0-frame kidnap recovery, at ~2.4× per-step cost for the
grid-indexed nearest-neighbour search.
Test plan
cmake .. && make gpu_megaparticles_gicp_mcl -jbuilds clean (CUDA C++14,--expt-relaxed-constexpr).gif/gpu_megaparticles_gicp_mcl.gif(760×215, 1.6 MB).Notes
.cufile; reusesinclude/cuda_check.cuhandinclude/cuda_video.h.combined-covariance Mahalanobis scoring and per-particle Gauss-Newton are the
GICP D2D substance; the distance-field fallback is what makes it robust enough
for global kidnap recovery at this particle count.