Skip to content

Add GPU MegaParticles LSH neighbor index demo#101

Merged
rsasaki0109 merged 1 commit into
masterfrom
feat/gpu-megaparticles-lsh
May 25, 2026
Merged

Add GPU MegaParticles LSH neighbor index demo#101
rsasaki0109 merged 1 commit into
masterfrom
feat/gpu-megaparticles-lsh

Conversation

@rsasaki0109
Copy link
Copy Markdown
Owner

Summary

MegaParticles-style relocalization with an explicit p-stable LSH neighbor index, replacing the fixed-grid neighbor stand-in of the earlier gpu_megaparticles_stein_mcl demo. The grid was a single axis-aligned partition, so particles near a cell boundary never aggregated with their true neighbors one cell over. This demo uses the actual Datar et al. (2004) p-stable LSH scheme: L=8 independent hash tables, each formed from K=3 random Gaussian projections of the 4-D pose feature (x, y, s·cos θ, s·sin θ) quantised at bin width r. Two particles are neighbors if they collide in at least one table; the random offsets and multiple tables recover the cross-boundary neighbors the grid misses.

Both filter paths are identical except for the neighbor structure — one million globally-distributed particles, the same range-field likelihood, the same Gauss-Newton-like per-particle step, the same posterior smoothing, the same shared coarse-grid representative-state readout, and the same hidden-kidnap blackout. The only independent variable is grid-neighbor vs LSH-neighbor aggregation, so the reported neighbor recall and post-kidnap RMSE isolate the contribution of the explicit LSH index.

Neighbor recall is measured directly: on a sampled particle pool, brute-force kNN within a fixed feature radius defines the ground-truth neighbor set, and each method is scored by the fraction of true neighbors it recovers (same-grid-cell vs collide-in-any-LSH-table).

Results

Metric Fixed grid Explicit LSH
Neighbor recall vs brute-force kNN 58.2% 87.8%
Post-kidnap RMSE 0.099 m 0.088 m
Reacquisition after blackout 0 frames 0 frames
Avg GPU step 4.9 ms 9.6 ms

The LSH index recovers ~30 points more of the true neighbor set (the multi-table OR overcomes the single grid's boundary misses), with comparable-to-slightly-better relocalization, at ~2× the per-step cost from the 8-table OR atomics — an honest trade the demo reports rather than hides.

Test plan

  • cmake .. && make gpu_megaparticles_lsh -j$(nproc) builds clean
  • ./bin/gpu_megaparticles_lsh runs end-to-end, writes the GIF
  • neighbor recall 58.2% → 87.8%, post-kidnap RMSE 0.099 → 0.088 m
  • git diff --check clean (no whitespace errors)
  • GIF ≤ 3 MB (1.5 MB), deployed to gh-pages, URL returns HTTP 200

demo

Replace the fixed-grid neighbor stand-in of the earlier MegaParticles-style
demo with an explicit p-stable LSH neighbor index (Datar et al. 2004): L=8
independent hash tables, each from K=3 random Gaussian projections of the 4-D
pose feature quantised at bin width r, with collision in any table defining a
neighbor. A controlled head-to-head comparison runs two 1M-particle filters
with identical Stein machinery, likelihood, posterior smoothing, and
representative-state readout, so the only independent variable is the neighbor
structure. Neighbor recall vs brute-force kNN rises 58.2% -> 87.8% as the
random multi-table OR recovers the cross-boundary neighbors the single grid
misses; post-kidnap relocalization RMSE 0.099 -> 0.088 m, both reacquire in 0
frames. LSH costs 9.6 ms vs grid 4.9 ms per step (8-table OR atomics).
@rsasaki0109 rsasaki0109 marked this pull request as ready for review May 25, 2026 08:12
@rsasaki0109 rsasaki0109 merged commit 2c125b1 into master May 25, 2026
1 check passed
@rsasaki0109 rsasaki0109 deleted the feat/gpu-megaparticles-lsh branch May 25, 2026 08:13
rsasaki0109 added a commit that referenced this pull request May 26, 2026
Replace the hand-tuned representative-state continuity gate carried by the
MegaParticles line (#86/#101/#104/#115) with a principled robust fixed-lag
smoother, reporting raw max-posterior vs smoothed pose error separately.

The GPU runs the expensive part exactly as #86 (1,048,576 particles,
distance-field likelihood, bucket-neighbor Stein motion, posterior smoothing)
and emits one raw max-posterior representative pose per frame.  A lightweight
host backend keeps a sliding window of the last 10 frames and jointly optimises
a smoothed pose chain by IRLS Gauss-Newton with switchable CV-motion factors
(a genuine kidnap discontinuity breaks the link instead of being smeared) and
Huber-robust measurement factors (one-frame spurious max-posterior spikes are
rejected).  A frame is finalized once it falls off the window head.

A robust smoother alone cannot distinguish a sustained new-location measurement
(kidnap) from an outlier, so it stuck to the coasted old trajectory after the
kidnap.  Fix: a data-driven reset that fires only when measurements resume far
from the coast after a measurement dropout, distinguishing a genuine
relocalization from the high-confidence spurious-mode flips during tracking
(those stay rejected as outliers).

Controlled comparison, 4 runs (GPU atomicAdd noise floor): in-track jitter
(mean |d2 pos|) raw 4.31 -> smoothed ~0.06 (~70x, truth 0.0055), in-track RMSE
raw 5.4 -> smoothed ~0.25 m, post-kidnap RMSE raw ~1.2-1.9 -> smoothed ~0.09 m,
recovers the hidden kidnap in 0 frames; host backend adds negligible cost
(GPU step ~5 ms).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant