CAPruner

This repository contains the official PyTorch implementation for the paper:

Shengli Zhou, Xiangchen Wang, Guanhua Chen, and Feng Zheng. 2026. CAPruner: Conceptual-Adjacent Scene Graph Pruner for Enhancing 3D Spatial Reasoning of Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics.

Overview

Large language models (LLMs) have recently been applied to 3D vision-language (3D-VL) tasks, in which spatial reasoning is required to identify target objects based on their positions relative to others (i.e., anchors). To facilitate effective scene layout understanding, scene graphs are commonly used to represent such spatial relations. However, reasoning over full graphs incurs high token costs and computational inefficiencies, motivating the use of scene graph pruning. Existing pruning methods predominantly rely on spatial proximity and often remove task-relevant relations, thereby undermining reliable spatial reasoning. To address these limitations, we derive a key requirement for scene graph pruning: preserving the spatial relations that are most relevant to the specific 3D-VL task. Guided by this insight, we propose the Conceptual-Adjacent Scene Graph Pruner (CAPruner). CAPruner integrates fuzzy semantic relevance with spatial proximity to estimate relation importance, enabling the selection of critical relations in a task-specific context. Moreover, to avoid costly relation-level annotations, CAPruner is trained by supervising the aggregated scores of each node's incident edges. Extensive experiments demonstrate that CAPruner effectively preserves relations essential for spatial reasoning, leading to substantial performance improvements of LLMs on 3D-VL tasks.

Environment Installation

CAPruner

Prepare the environment for CAPruner:

conda create -n capruner python=3.9.17
conda activate capruner
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

Data Preparation

Download annotations from Hugging Face AND Yandex Disk and place them in the annotations/ directory. More details for data preparation can be found here.
Extract output_vlsat.zip under annotations/ (i.e., annotations/output_vlsat/).
Prepare bounding boxes for ground-truth (GT) and Mask3D (M3D) segmentations:

python preprocess/prepare_gt_boxes.py
python preprocess/prepare_m3d_boxes.py

Model Usage

Pretrain CAPruner on ScanRefer, ScanQA, and Multi3DRefer datasets.

which_python=$(which python)
export PYTHONPATH=${PYTHONPATH}:${which_python}:.
python model/train.py --config config/capruner_m3d_pretrain.yml

Validate the model: Modify the config/capruner_m3d_pretrain.yml to set pretrained to the path to the checkpoint.

python model/train.py --config config/capruner_m3d_pretrain.yml --val

Prune scene graphs using CAPruner (pruning method, dataset, and split are specified in the config file):

python model/inference.py --config config/capruner_m3d_pretrain.yml

To skip previous steps, you can download the inference result based on the official checkpoint at Hugging Face. Each file is a (num_questions, max_num_obj, num_edge_per_obj) tensor G, representing that there is an edge between j and G[i][j][k] in the scene graph corresponding to the i-th 3D-VL task.

Generate GNN features for the pruned scene graphs (the base_path is the directory where the checkpoint is saved):

python preprocess/batch_gen_gnn_feat.py --base_path outputs/path_to_model_dir

Acknowledgement

We would like to thank the anonymous reviewers for their constructive feedback.

Citation

If you find this project useful in your research, please consider citing:

@misc{zhou2026capruner,
      title={CAPruner: Conceptual-Adjacent Scene Graph Pruner for Enhancing 3D Spatial Reasoning of Large Language Models},
      author={Shengli Zhou and Xiangchen Wang and Guanhua Chen and Feng Zheng},
      year={2026},
      url={https://www.researchgate.net/publication/404391563_CAPruner_Conceptual-Adjacent_Scene_Graph_Pruner_for_Enhancing_3D_Spatial_Reasoning_of_Large_Language_Models},
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
data/scannet_metadata		data/scannet_metadata
model		model
preprocess		preprocess
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAPruner

Overview

Environment Installation

CAPruner

Data Preparation

Model Usage

Acknowledgement

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CAPruner

Overview

Environment Installation

CAPruner

Data Preparation

Model Usage

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages