Xiao Cao, Yuze Li, Youmin Zhang, Jiayu Song, Cheng Yan, Wen Li, Lixin Duan
SIGGRAPH 2026
✨ Your star means a lot to us! Please give it a star if you find it helpful. 🌟
- 🎉 2026-4-30 — Our paper GeoQuery has been accepted to SIGGRAPH 2026 (Conference Track). See the arXiv preprint (DOI).
- ✅ 2026-5-27 — Official training code and model weights are released.
| Model | Description | Link |
|---|---|---|
| GeoQuery diffusion refiner | Checkpoint for GeoQuery image restoration and gsplat integration. Use with --geoquery_ckpt. |
Hugging Face |
GeoQuery improves rendered novel views by querying geometry-aligned reference features inside a single-step diffusion image restoration model, and can be plugged into sparse-view 3D Gaussian Splatting pipelines.
Starting from a sparse training set, we optimize 3DGS and progressively refine it through iterative rendering and supervision updates. At each step, 3DGS produces an artifact-prone rendering; we estimate metric depth to build a geometric correspondence field. Geometry-Guided Cross-View Attention (GCA) retrieves proxy features from the reference view within a local neighborhood, and adaptive fusion integrates geometry-guided evidence into the diffusion backbone. The restored output serves as a pseudo-observation for subsequent 3DGS refinement.
| Component | Path | Description |
|---|---|---|
| GeoQuery model | src/model.py |
Model definition and inference |
| Diffusion training | src/train_geoquery.py |
Train the GeoQuery refiner |
| GCA | src/geometry_guided_attention.py |
Geometry-Guided Cross-View Attention |
| AFF | src/adaptive_feature_fusion.py |
Adaptive fusion (self-attn + GCA) |
| Correspondence | src/geometry_utils.py |
Geometric correspondence field |
| 3DGS example | examples_geoquery/gsplat/ |
Iterative gsplat + GeoQuery |
conda create -n geoquery python=3.10.16
conda activate geoquery
pip install -r requirements.txtFor the gsplat example:
pip install -r examples_geoquery/gsplat/requirements.txtGeoQuery training uses paired rendered/target images and a geometry reference view. Each sample contains:
image: the degraded/rendered target-view image used as input.target_image: the clean target-view supervision image.ref_image: a clean reference-view image from the same scene.ref_depth: metric depth forref_image, stored as.pfm.ref_depth_confidence: a confidence map forref_depth, stored as.npy.prompt: the text prompt for the one-step diffusion refiner.
The reference depth can be produced by any accurate depth or MVS estimator. In
our pipeline, ref_depth can be prepared with methods such as MVSFormer++
or Depth Anything 3. The depth should be in the reference camera coordinate
system and should be geometrically aligned with ref_image. Confidence values
are expected in [0, 100]; --conf_threshold is applied after dividing the
confidence map by 100.
The training script also needs COLMAP cameras and poses for each scene through
--gt_colmap. The expected layout is:
path/to/colmap_root/
scene_name/
gaussian_splat/
sparse/
0/
cameras.bin
images.bin
Prepare a JSON dataset with train / test splits:
{
"train": {
"sample_id": {
"image": "path/to/source.png",
"target_image": "path/to/target.png",
"ref_image": "path/to/reference.png",
"ref_depth": "path/to/reference_depth.pfm",
"ref_depth_confidence": "path/to/reference_confidence.npy",
"prompt": "remove degradation"
}
},
"test": {...}
}Train GeoQuery (or use train_geoquery_diffusion.sh):
CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0} accelerate launch src/train_geoquery.py \
--mixed_precision=bf16 \
--output_dir="${OUTPUT_DIR:-outputs/geoquery}" \
--dataset_path="${DATASET_JSON:-path/to/dataset.json}" \
--max_train_steps 100005 \
--learning_rate 2e-5 \
--input_mode "resize" \
--train_batch_size=1 --dataloader_num_workers 8 \
--enable_xformers_memory_efficient_attention \
--checkpointing_steps=20000 --eval_freq 20000 --viz_freq 2000 \
--lambda_lpips 1.0 --lambda_l2 1.0 --lambda_gram 0.5 --gram_loss_warmup_steps 5000 \
--report_to "wandb" --tracker_project_name "${WANDB_PROJECT:-geoquery}" \
--tracker_run_name "${WANDB_RUN_NAME:-geoquery_train}" --timestep 199 \
--gt_colmap "${GTCOLMAP_ROOT:-path/to/colmap_root}" \
--conf_threshold 0.0 \
--neighborhood_size 3 \
--low_res_onlyResume from checkpoint:
--resume path/to/checkpoints/model_100001.pklApply GeoQuery during iterative 3DGS updates (run_geoquery_gsplat.sh):
CUDA_VISIBLE_DEVICES=0 python examples_geoquery/gsplat/train_geoquery_gsplat.py default \
--data_dir path/to/scene \
--data_factor 4 \
--result_dir outputs/geoquery_gsplat/scene \
--ckpt path/to/3dgs_checkpoint.pt \
--geoquery_ckpt path/to/geoquery_checkpoint.pkl \
--n_views 9 \
--dataset_type mipnerf360 \
--no-normalize-world-space \
--low_res_only \
--window_size 3 \
--depth_dir path/to/reference_depths--window_size should match --neighborhood_size used in diffusion training.
The gsplat integration does not estimate stereo depth on the fly. It reads
precomputed reference depths from --depth_dir. Depth maps should be stored as
.npy files with the same base name as the training image, without the image
extension:
path/to/reference_depths/
mipnerf360/
garden/
9_views/
depth/
_DSC8681.npy
_DSC8682.npy
confidence/
_DSC8681.npy
_DSC8682.npy
The general path rule is:
{depth_dir}/{dataset_type}/{scene_name}/{n_views}_views/depth/{image_base}.npy
where:
dataset_typeis the value passed by--dataset_type(mipnerf360ordl3dv;autoinfers it from--data_dir).scene_nameisbasename(data_dir)for Mip-NeRF 360 scenes. For DL3DV-style paths ending ingaussian_splat, it is the parent directory name.n_viewsmust match the--n_viewsargument.image_baseis the training image filename without extension.
The current gsplat loader uses the files in depth/; the optional
confidence/ folder is kept for consistency with common MVS/Depth Anything
outputs.
We gratefully acknowledge the projects and datasets that helped make GeoQuery possible, including MVSFormer++, Depth Anything 3, DIFIX3D+, gsplat, DL3DV-10K, and Mip-NeRF 360.
If you find this work useful, please cite:
@misc{cao2026geoquery,
title={GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction},
author={Xiao Cao and Yuze Li and Youmin Zhang and Jiayu Song and Cheng Yan and Wen Li and Lixin Duan},
year={2026},
eprint={2605.12399},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.12399},
}