This is the official repo for the mFUND pic2bridge project. The objective of the mFUND project (Linie 1) is to address the following questions:
- Can bridges be reconstructed based on a limited number of images?
- Can the elevation profile be reconstructed?
- How accurate is the estimated elevation profile
We use Docker for easy containerization. Be sure to install according to this. If using a GPU, install the NVIDIA Container Toolkit afterwards.
When first using the repo copy the .devcontainer/devcontainer.json.EXAMPLE to the .devcontainer/devcontainer.json.
This is your local file that git will ignore and can contain paths to the local file system or the GPU config on the local or deep learning server. The Remote Development extension needs to be installed to work inside the devcontainer. Once installed the main image needs to be build to automaticlly open the devcontainer:
cd ${localWorkspaceFolder}
docker build -t pic_2_bridge .We semantically segment the given images to derive the main bridge structures. The images and masks are then used in VGGT to reconstruct the 3D scene. The classes within the point cloud are used to align the output to our defined coordinate system to finally project the elevation profile into a 2D plane and fit a polynomial curve to it. By knowing the length of the bridge, the final polynom can be scaled accordingly. The config files for all required parameters are located in the config directory.
We have been playing around with a couple of different networks, but it seems to be that DeepLabV3 has a pretty strong performance. That is why we are sticking with this architecture for now. Simply run it using
python3 predict_images.py <input_image_dir> <output_image_dir>So far, one of the best approaches to 3D reconstruction using a limited number of points is VGGT. For this reason, we use this method for the reconstruction of the 3D scene, but enrich it with the masks from 1. It can be run sololy using:
python3 reconstruct_scene.py <input_image_dir> <input_mask_dir> <output_data_dir>We search for the largest cluster containing points belonging to the superstructure (ID=3). This cluster is then first fitted using RANSAC to find the best plane and aligned to have the normal equivalent to the z-axis. The xy orientation is then corrected using PCA to align the largest eigenvector of the superstructure with the x-axis.. Both transformations are applied to the whole scene and the defined oriented Bbox is used to cut out the corresponding bridge part without any limits in the z-direction.
It can be used:
python3 align_scene.py <input_point_cloud_las> <input_camera_parameter.json> <output_point_cloud_las>Use polynomial fit to describe the valley situation. The superstructure and the points below it are fit separately. By taking the intersection points, we get the line segment that should correspond to the original bridge length. This way we can easily scale it.
python3 fit_valley.py <input_point_cloud_las> <output_fit_dir>To automatically run all the individual steps, simply run:
python3 main_valley_extract.py <input_image_dir> <output_dir>We use the 3D LiDAR scans from SemanticBridge to verify the results.
We would like to thank the following authors of the following works:
@inproceedings{wang2025vggt,
title={VGGT: Visual Geometry Grounded Transformer},
author={Wang, Jianyuan and Chen, Minghao and Karaev, Nikita and Vedaldi, Andrea and Rupprecht, Christian and Novotny, David},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}

