Skip to content

Rework core code and implement a new CUDA based GPU backend#50

Open
bozbez wants to merge 64 commits intoemanuelev:develfrom
bozbez:master
Open

Rework core code and implement a new CUDA based GPU backend#50
bozbez wants to merge 64 commits intoemanuelev:develfrom
bozbez:master

Conversation

@bozbez
Copy link
Copy Markdown

@bozbez bozbez commented Jun 23, 2020

Benchmark results

Benchmarking results for the new CUDA backend, with 640x480 computation resolution and 1cm voxel size (left bars are for SDF, right for OFusion). Note the tracker has not (yet) been ported and was running on the CPU for all backends.

Summary of changes

  • Removed se_ prefix from root directories
  • Added .clang-format and auto formatted all except tools/ and apps/
  • Cleaned up various build warnings and simplified CMake files
  • Switched the Makefile to use Ninja
  • Moved tracker code to its own static library in tracking/
  • Templated octree on the node and block buffer type
  • Unified field implementations into voxel_traits
  • Extracted octree manipulating pipeline stages to new intermediate Backend class in backend/, which compiles to new static libraries supereight-backend-{openmp, cuda}-{sdf, ofusion}.a
  • Added allocator template to Image and added new 1D Buffer class
  • Added CUDA allocator and memory pool
  • Implemented CUDA accelerated allocation, integration and raycasting, tuned for desktop GPUs

Overview of new code structure

1

TODO

  • Move core buffer implementation from Image to Buffer, and make Image inherit Buffer
  • Extend Backend interface to include save/load, VTK dumping, etc
  • Add documentation to backend/
  • Clean up backend/src/openmp and extract common code to BackendBase and backend/src/common
  • Re-introduce the tests in core/test and extend for new functions
  • (Future) Port the CUDA tracking kernels from the slambench repo
  • (Future) Tune allocation for other GPUs than the GTX 1080 Ti
  • (Future) Investigate raycast optimizations, including voxel splatting and hardware trilinear interpolation

@bozbez bozbez changed the title Implement a CUDA backend Rework core code and implement a new CUDA based GPU backend Jul 27, 2020
@bozbez bozbez marked this pull request as ready for review July 28, 2020 10:56
@emanuelev emanuelev self-assigned this Aug 30, 2020
Copy link
Copy Markdown
Owner

@emanuelev emanuelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks a lot! Got few comments to get a discussion started, let me know what you think.

Comment thread CMakeLists.txt
add_subdirectory(se_core)
add_subdirectory(se_shared)
add_subdirectory(se_tools)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14 -fdiagnostics-color=always -faligned-new")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason here to manually set CMAKE_CXX_FLAGS instead of add_compile_options?

Comment thread Makefile
all:
mkdir -p build/
cd build/ && cmake -DCMAKE_BUILD_TYPE=Release \
cd build/ && cmake -GNinja -DCMAKE_BUILD_TYPE=Release \
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure adding a dependency on Ninja would be helpful. Perhaps making it optional and reverting back to Make if not available?

Comment on lines +56 to +61
template<typename OctreeT, typename HashType, typename IncF>
SE_DEVICE_FUNC static void buildAllocationList(HashType* allocation_list,
int reserved, IncF get_idx, const OctreeT& octree,
const Eigen::Vector3f& world_vertex, const Eigen::Vector3f& direction,
const Eigen::Vector3f& camera_pos, float depth_sample,
float noise_factor);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this. Usually traits class are quite lightweight and convey only type infos. I'd break this into two classes at least. Like having an ofusion/sdf class that specify and implements the buildAllocationList and raycast interface.


inline float3 voxelToPos(const int3 p, const float voxelSize){
return make_float3(p.x * voxelSize, p.y * voxelSize, p.z * voxelSize);
inline float3 voxelToPos(const int3 p, const float voxelSize) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be deleted, leftover of the pre-eigen port.

@@ -0,0 +1,188 @@
/*
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about moving this to backend. Do you see a way to keep it in the core lib? Ideally core is everything octree-related but denseslam independent. Backend at this stage is somehow a mix which I'd rather avoid (think of someone that wants to use the lib without the whole denseslam dependency).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants