Voxel Densification for Serialized 3D Object Detection: Mitigating Sparsity via Pre-serialization Expansion
This repository serves as the official implementation for the paper:
Voxel Densification for Serialized 3D Object Detection: Mitigating
Sparsity via Pre-serialization Expansion
Abstract: Recent advances in point cloud object detection have increasingly adopted Transformer-based and State Space Models (SSMs) to capture long-range dependencies. However, these serialized frameworks strictly maintain the consistency of input and output voxel dimensions, inherently lacking the capability for voxel expansion. This limitation hinders performance, as expanding the voxel set is known to significantly enhance detection accuracy, particularly for sparse foreground objects. To bridge this gap, we propose a novel Voxel Densification Module (VDM). Unlike standard convolutional stems, VDM is explicitly designed to promote pre-serialization spatial expansion. It leverages sparse 3D convolutions to propagate foreground semantics to neighboring empty voxels, effectively densifying the feature representation before it is flattened into a sequence. VDM serves two key functions: (1) enhancing spatial connectivity via voxel densification, and (2) aggregating fine-grained local context through residual sparse blocks. Crucially, to balance the computational overhead of increased voxel density, we introduce a strategic downsampling mechanism. We integrate VDM into both Transformer-based (DSVT) and SSM-based (LION) detectors. Extensive experiments demonstrate that VDM consistently improves detection accuracy across multiple benchmarks.
- [2026-02-05] Initial code release.
- Pre-serialization Spatial Expansion: Defines a new paradigm to explicitly expand the foreground voxel set before sequence flattening, addressing the sparsity limitation in serialized models.
- Generic Plugin: Seamlessly integrates with state-of-the-art serialized detectors, including Transformer-based (DSVT) and SSM-based (LION) frameworks.
We provide the checkpoints and logs for our main models, including the full VDM and the Only-Densification (VDM-OD) variant.
| Model | Dataset | Split | Metric | Performance | Config | Baidu Pan | Hugging Face |
|---|---|---|---|---|---|---|---|
| VDM | Waymo | Val | L2 mAPH | 74.8 | waymo_vdm.yaml |
Baidu Pan | HF |
| VDM-OD | Waymo | Val | L2 mAPH | 74.8 | waymo_vdm_od.yaml |
Baidu Pan | HF |
| VDM | nuScenes | Val | mAP | 68.1 | nuscenes_vdm.yaml |
Baidu Pan | HF |
| VDM-OD | nuScenes | Val | mAP | 68.5 | nuscenes_vdm_od.yaml |
Baidu Pan | HF |
| VDM | Argoverse 2 | Val | mAP | 42.3 | argo2_vdm.yaml |
Baidu Pan | HF |
| VDM-OD | Argoverse 2 | Val | mAP | 42.6 | argo2_vdm_od.yaml |
Baidu Pan | HF |
| VDM | ONCE | Val | mAP | 67.6 | once_vdm.yaml |
Baidu Pan | HF |
| VDM-OD | ONCE | Val | mAP | 66.1 | once_vdm_od.yaml |
Baidu Pan | HF |
Note for Baidu Pan links: > - VDM extraction code:
jk22
- VDM-OD extraction code:
nb3fNote for Hugging Face links: > - You can also browse the full model repository directly at hfffkk/VDM.
This code has been tested in the following environment. Other versions might also work, but we recommend matching these for the best compatibility:
- OS: Linux (Ubuntu)
- Python: 3.10
- CUDA: 11.8
- PyTorch: 2.1.0
- Spconv: 2.3.6 (
spconv-cu118)
Please refer to docs/INSTALL.md for detailed step-by-step installation instructions, including environment setup and dependency installation.
Please refer to docs/GETTING_STARTED.md to learn how to prepare the datasets and run the training and testing scripts.
Our codebase integrates and builds upon the LION framework and OpenPCDet. We sincerely thank the authors for their outstanding work:
@article{liu2024lion,
title={LION: Linear Group RNN for 3D Object Detection in Point Clouds},
author={Zhe Liu, Jinghua Hou, Xingyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai},
journal={Advances in Neural Information Processing Systems},
year={2024}
}