Skip to content

Thrust sort bug#1

Merged
NeKon69 merged 4 commits intomainfrom
thrust-sort-bug
Sep 2, 2025
Merged

Thrust sort bug#1
NeKon69 merged 4 commits intomainfrom
thrust-sort-bug

Conversation

@NeKon69
Copy link
Copy Markdown
Owner

@NeKon69 NeKon69 commented Sep 2, 2025

Summary by CodeRabbit

  • New Features

    • Thread-safe multi-context OpenGL management for main, tessellation, and texture generation tasks.
    • Added GPU kernel to compute tangents/bitangents/UVs during tessellation.
  • Improvements

    • More reliable rendering via explicit context locking.
    • Dynamic, count-driven sphere tessellation for better scalability and stability.
    • Window system now exposes graphics context data for integrated workflows.
    • Synchronization added before rendering to prevent visual glitches.
  • Changes

    • Startup no longer forces a specific CUDA device; uses runtime defaults.
  • Chores

    • Build configuration updated for CUDA linkage and behavior.

…e_generation, separated what which class does, and finished new tessellation process, although hitting some nasty bug with thrust
…ake sure they actually produce the goddam sphere
…ces accessible in the worker thread via some shenanigans
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 2, 2025

Walkthrough

Adds multi-context GL management via a new graphics_data and gl_context_lock; updates window to create/share three GL contexts; adjusts renderer and sphere generation to lock appropriate contexts; revises game loop state handling; modifies CUDA buffer behavior and GL interop mapping; refactors tessellation to dynamic counts; updates sphere APIs and kernel declarations; minor build and formatting tweaks.

Changes

Cohort / File(s) Summary
Build system (CUDA linkage)
CMakeLists.txt
Disable CUDA static library creation; link against CUDA::cudart instead of CUDA::toolkit.
Core game loop and state init
src/core/game.cpp, include/core/game.h, include/game_states/playing_state.h, src/game_states/playing_state.cpp
Game loop now checks/handles empty stack and pop-after-input; playing_state constructor now accepts graphics::graphics_data&; call sites updated; minor private member formatting; update/draw order clarified; added sync before draw; disabled simulation update.
GL context infrastructure
include/graphics/gl_context_lock.h, include/window/fwd.h
New RAII GL context lock for MAIN/TESS/TEX_GEN with per-context mutexes; introduces graphics::graphics_data; forward declarations added.
Window and GL context management
include/window/gl_window.h, src/window/gl_window.cpp
Window now owns graphics_data with three GL contexts; creates shared contexts; new get_data() accessor; all operations reference data.window; cleanup for all contexts; improved error handling.
Renderer updates
include/rendering/renderer.h, src/rendering/renderer.cpp
render is no longer const; acquires MAIN GL context via gl_context_lock during rendering; rest of flow unchanged.
Sphere generation API and threading
include/sphere_generation/sphere_generator.h, src/sphere_generation/sphere_generator.cpp
sphere_generator::generate now requires graphics::graphics_data&; worker thread locks TESS GL context; removes per-thread CUDA device selection; launch flow otherwise intact.
Tessellation launcher and kernels
include/sphere_generation/kernel_launcher.h, src/sphere_generation/launch_tessellation.cu, include/sphere_generation/tessellation_kernel.h, src/sphere_generation/tessellation_kernel.cu
Remove launch_orthogonalization from header/impl; add calculate_tbn_and_uv kernel decl and call; switch to dynamic host-tracked counts and host-device sync; add edge comparator; change unique edge id types to int.
Icosahedron data manager
src/sphere_generation/icosahedron_data_manager.cpp
Triple capacities for edge-related buffers; upload base icosahedron vertices/indices to device in init; prepare allocates 3× buffers; cleanup frees new buffers.
CUDA buffer and GL interop
include/cuda_types/buffer.h, include/cuda_types/cuda_from_gl_data.h
Add host-only dereference operator for host-side buffers; constructor no longer creates stream for host buffers; device buffers still allocate with stream; GL interop now unmaps resource in ctor after mapping (leaving not-mapped state).
Sphere generation fwd/types
include/sphere_generation/fwd.h
Replace functor/three-way compare with <, ==, != operators for edge; update MAXIMUM_AMOUNT_OF_VERTICES to 10 * 4^MAX_STEPS + 2.
Miscellaneous
src/main.cpp
Remove cudaSetDevice(0); rely on default device.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App
  participant Game as core::game
  participant State as playing_state
  participant Renderer
  participant Window as gl_window
  Note over App,Renderer: Initialization
  App->>Game: construct(renderer)
  Game->>State: push playing_state(window_data, window_size)
  loop Main loop
    Game->>State: handle_input()
    alt input requests pop
      State-->>Game: true
      Game->>Game: pop_state()
      opt stack empty
        Game-->>App: exit
      end
    else continue
      Game->>State: update(dt)
      Game->>Renderer: render(queue, camera)
      Renderer->>Window: get_data()
      Renderer->>Renderer: gl_context_lock<MAIN>
      Renderer->>Renderer: issue GL draws
      Renderer-->>Renderer: unlock context
    end
  end
Loading
sequenceDiagram
  autonumber
  participant Play as playing_state
  participant SG as sphere_generator
  participant Win as gl_window
  participant Tess as gl_context_lock<TESS>
  participant Src as icosahedron_data_manager
  participant CU as launch_tessellation.cu
  participant Kern as CUDA Kernels
  Play->>Win: get_data()
  Play->>SG: generate(steps, stream, source, data)
  SG->>Tess: lock TESS context (SDL_GL_MakeCurrent)
  SG->>Src: prepare/create_context()
  SG->>CU: launch_tessellation(data_for_thread, steps, stream)
  CU->>Kern: generate_edges / unique edges / create_triangles
  CU->>Kern: calculate_tbn_and_uv(vertices, count)
  CU-->>SG: sync as needed
  SG-->>Tess: unlock context
Loading
sequenceDiagram
  autonumber
  participant Renderer
  participant Win as gl_window
  participant Lock as gl_context_lock<MAIN>
  Renderer->>Win: get_data()
  Renderer->>Lock: construct (locks MAIN)
  Renderer->>Renderer: set shaders, bind meshes
  Renderer->>Renderer: draw calls
  Renderer-->>Lock: destroy (releases)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I tap my paws on keys of light,
New threads weave GL by day and night.
Three contexts hum, in mutexed cheer,
A sphere is born, its edges clear.
With locks in place and kernels spun—
I twitch my nose: ship build, well done! 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch thrust-sort-bug

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai bot added a commit that referenced this pull request Sep 2, 2025
Docstrings generation was requested by @NeKon69.

* #1 (comment)

The following files were modified:

* `include/core/game.h`
* `include/cuda_types/buffer.h`
* `include/cuda_types/cuda_from_gl_data.h`
* `include/game_states/playing_state.h`
* `include/graphics/gl_context_lock.h`
* `include/rendering/renderer.h`
* `include/sphere_generation/fwd.h`
* `include/sphere_generation/kernel_launcher.h`
* `include/sphere_generation/sphere_generator.h`
* `include/sphere_generation/tessellation_kernel.h`
* `include/window/fwd.h`
* `include/window/gl_window.h`
* `src/core/game.cpp`
* `src/game_states/playing_state.cpp`
* `src/main.cpp`
* `src/rendering/renderer.cpp`
* `src/sphere_generation/icosahedron_data_manager.cpp`
* `src/sphere_generation/launch_tessellation.cu`
* `src/sphere_generation/sphere_generator.cpp`
* `src/sphere_generation/tessellation_kernel.cu`
* `src/window/gl_window.cpp`
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 2, 2025

Note

Generated docstrings for this pull request at #2

@NeKon69 NeKon69 merged commit 92dc147 into main Sep 2, 2025
1 check was pending
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/sphere_generation/tessellation_kernel.cu (1)

115-123: Potential -1 indexing into edge_to_vertex → undefined behavior. Guard or assert before use.

find_edge returns -1 on miss; changing IDs to int is fine, but indexing with a negative value is UB and will corrupt output.

Apply:

- int unique_id_01 = find_edge(unique_edges, num_unique_edges, e01);
- int unique_id_12 = find_edge(unique_edges, num_unique_edges, e12);
- int unique_id_20 = find_edge(unique_edges, num_unique_edges, e20);
+ int unique_id_01 = find_edge(unique_edges, num_unique_edges, e01);
+ int unique_id_12 = find_edge(unique_edges, num_unique_edges, e12);
+ int unique_id_20 = find_edge(unique_edges, num_unique_edges, e20);
+ if (unique_id_01 < 0 || unique_id_12 < 0 || unique_id_20 < 0) {
+   assert(!"create_triangles: edge lookup failed");
+   return; // prevents out-of-bounds writes; consider a better recovery if needed
+ }
 
- uint32_t new_i01 = edge_to_vertex[unique_id_01];
- uint32_t new_i12 = edge_to_vertex[unique_id_12];
- uint32_t new_i20 = edge_to_vertex[unique_id_20];
+ uint32_t new_i01 = edge_to_vertex[static_cast<uint32_t>(unique_id_01)];
+ uint32_t new_i12 = edge_to_vertex[static_cast<uint32_t>(unique_id_12)];
+ uint32_t new_i20 = edge_to_vertex[static_cast<uint32_t>(unique_id_20)];

And ensure device asserts are available:

#include <assert.h>
include/cuda_types/buffer.h (1)

29-31: Host-side stream removal introduces null deref and misuse of async ops on host memory

  • For host buffers, data_stream may be null (explicit ctor), but _memcpy/zero_data unconditionally use data_stream->stream() and async APIs. This will crash and/or use CUDA APIs invalid for host memory.

Use synchronous memcpy for host buffers and std::memset for zeroing; keep async path for device buffers.

-	void _memcpy(T *dst, T *src, size_t size, cudaMemcpyKind kind) {
-		CUDA_SAFE_CALL(cudaMemcpyAsync(dst, src, size, kind, data_stream->stream()));
-	}
+	void _memcpy(T *dst, T *src, size_t size, cudaMemcpyKind kind) {
+		if constexpr (Side == side::device) {
+			CUDA_SAFE_CALL(cudaMemcpyAsync(dst, src, size, kind, data_stream->stream()));
+		} else {
+			// Host-path: synchronous copy is sufficient and avoids stream dependency
+			CUDA_SAFE_CALL(cudaMemcpy(dst, src, size, kind));
+		}
+	}
-	explicit cuda_buffer(const size_t size) : _size(size) {
-		if constexpr (Side == side::device) {
-			data_stream = std::make_shared<cuda_stream>();
-		}
-		alloc();
-	}
+	explicit cuda_buffer(const size_t size) : _size(size) {
+		if constexpr (Side == side::device) {
+			data_stream = std::make_shared<cuda_stream>();
+		}
+		alloc();
+	}
-	void zero_data(size_t amount) {
-		cudaMemsetAsync(ptr, 0, amount, data_stream->stream());
-	}
+	void zero_data(size_t amount) {
+		if constexpr (Side == side::device) {
+			cudaMemsetAsync(ptr, 0, amount, data_stream->stream());
+		} else {
+			std::memset(ptr, 0, amount);
+		}
+	}

Add required header for std::memset (outside the changed hunk):

#include <cstring>

Also applies to: 58-63, 154-156

src/game_states/playing_state.cpp (2)

24-33: Out-of-bounds access: AM_POINT_LIGHTS (5) exceeds light_pos size (4).

Loop indexes 0..4 but light_pos has 4 elements, so i==4 dereferences past end → UB/crash. Derive the count from light_pos.

Apply this diff:

-namespace predef {
-static constexpr auto AM_POINT_LIGHTS = 5;
-}
+namespace predef {
+// Keep in sync with playing_state::light_pos
+static const auto AM_POINT_LIGHTS = playing_state::light_pos.size();
+}

133-143: Debounce the 'O' toggle; pressed_o never set true.

Without setting it on key-down, auto-repeat can flip sim state rapidly.

Apply this diff:

-        case SDL_SCANCODE_O:
-            if (pressed_o)
-                break;
+        case SDL_SCANCODE_O:
+            if (pressed_o)
+                break;
+            pressed_o = true;
             if (sim_state.running) {
                 system.pause();
                 sim_state.running = false;
             } else {
                 system.start();
                 sim_state.running = true;
             }
             break;
src/sphere_generation/launch_tessellation.cu (1)

87-89: Missing error handling for device-to-host copy

The code doesn't check if p_vertex_count is valid before copying from it. Add null pointer check or ensure it's always initialized.

+	if (p_vertex_count == nullptr) {
+		throw std::runtime_error("p_vertex_count is null");
+	}
	cudaMemcpyAsync(&num_vertices_cpu, p_vertex_count, sizeof(uint32_t), cudaMemcpyDeviceToHost,
					stream);
🧹 Nitpick comments (17)
include/window/fwd.h (1)

29-31: Forward declare only window types; drop stray graphics_data here

This declares raw::window::graphics_data, which differs from the graphics namespace type and risks confusion. Remove it or move a correct forward decl to the graphics header.

-struct graphics_data;
 class sdl_video;
 class gl_window;
include/window/gl_window.h (2)

12-12: Prefer project-include path over relative

Use a stable include path to avoid brittle .. traversal when include dirs change.

-#include "../graphics/gl_context_lock.h"
+#include "graphics/gl_context_lock.h"

44-45: Accessor signature should match the namespace decision

Align return type with the chosen qualification; consider const and non-const overloads if mutation isn’t always needed.

-	graphics::graphics_data& get_data();
+	raw::graphics::graphics_data& get_data();
+	const raw::graphics::graphics_data& get_data() const;
include/sphere_generation/fwd.h (1)

39-55: Switch to lexicographic operator< on edge is the right fix for Thrust sort.

Ordering matches make_canonical_edge and the HOST_DEVICE qualifiers keep it usable on device. Consider noexcept/constexpr for tiny perf/clarity.

Example:

- HOST_DEVICE bool operator<(const edge& other) const {
+ HOST_DEVICE constexpr bool operator<(const edge& other) const noexcept {
include/sphere_generation/kernel_launcher.h (1)

11-15: Typo/mismatch in parameter name: p_edges_cound → p_unique_edges_count (align with implementation and intent).

Name doesn’t affect ABI but harms readability and grep-ability alongside launch_tessellation.cu.

-								uint32_t *p_edges_cound, cudaStream_t &stream, uint32_t steps);
+								uint32_t *p_unique_edges_count, cudaStream_t &stream, uint32_t steps);
include/sphere_generation/tessellation_kernel.h (1)

27-28: Unify qualifier/style with surrounding declarations

Elsewhere in this header you use graphics::vertex (implicitly raw::graphics). Keep it consistent to reduce noise.

Apply:

-extern __global__ void calculate_tbn_and_uv(raw::graphics::vertex *vertices,
+extern __global__ void calculate_tbn_and_uv(graphics::vertex *vertices,
 											uint32_t			   num_input_vertices);
include/rendering/renderer.h (1)

26-26: Keep queue read-only and drop trailing whitespace

render doesn’t mutate the queue; make it const&, and fix the dangling space.

-	void render(raw::rendering::queue &command_queue, const core::camera::camera &camera) ;
+	void render(const raw::rendering::queue &command_queue, const core::camera::camera &camera);
include/sphere_generation/sphere_generator.h (1)

10-10: Reduce header coupling: avoid heavy include in a public header

This header only needs the graphics_data type; prefer forward-declaring it (include/window/fwd.h) and move gl_context_lock.h to the .cpp.

-#include "graphics/gl_context_lock.h"
+#include "window/fwd.h" // for raw::window::graphics_data forward decl

And in src/sphere_generation/sphere_generator.cpp, include gl_context_lock.h explicitly.

include/cuda_types/buffer.h (1)

46-50: Tiny ergonomics win

Consider also providing operator-> for host buffers to mirror pointer semantics.

+	__host__ T* operator->()
+		requires(Side == side::host)
+	{
+		return ptr;
+	}
+	__host__ const T* operator->() const
+		requires(Side == side::host)
+	{
+		return ptr;
+	}
src/rendering/renderer.cpp (1)

26-41: Reduce redundant shader state changes.

You set projection/view every command, even when consecutive commands use the same shader. Consider batching by shader or tracking the currently bound program to cut glUseProgram/uniform traffic.

src/game_states/playing_state.cpp (1)

188-192: Call sync before building the render queue for clarity/order.

It avoids any ambiguity about buffer visibility when the queue is constructed.

Apply this diff:

-    auto queue = build_rendering_queue();
-    sphere_gen.sync();
+    sphere_gen.sync();
+    auto queue = build_rendering_queue();
     renderer.render(queue, camera);
src/window/gl_window.cpp (2)

11-18: Duplicate GL context profile attribute setting.

SDL_GL_CONTEXT_PROFILE_MASK is set twice with the same value (lines 12 and 16). This is redundant.

Apply this diff to remove the duplicate:

 SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_CORE);
 SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1);
 gl::ATTR(SDL_GL_MULTISAMPLEBUFFERS, 1);
 gl::ATTR(SDL_GL_MULTISAMPLESAMPLES, 1024);
-gl::ATTR(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_CORE);
 gl::ATTR(SDL_GL_DOUBLEBUFFER, 1);
 gl::ATTR(SDL_GL_DEPTH_SIZE, 24);

36-49: Context creation follows a reasonable pattern but could be more robust.

The context creation sequence creates three contexts that should share resources. However, the sharing setup might need adjustment as noted above.

Consider this improved structure for proper context sharing:

 init_glad();
+SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1);
 data.tessellation_context = SDL_GL_CreateContext(data.window);
 if (!data.tessellation_context) {
     throw std::runtime_error("Tessellation OpenGL context could not be created! SDL_Error: " +
                              std::string(SDL_GetError()) + " in " + std::string(__FILE__) +
                              " on " + std::to_string(__LINE__) + " line");
 }
 data.texture_gen_context = SDL_GL_CreateContext(data.window);
src/sphere_generation/launch_tessellation.cu (2)

15-21: Unused comparator function

The edge_comparator function is defined but never used in this file. If it's intended for future use or external consumption, consider moving it to a header file. Otherwise, remove it to avoid dead code.


72-73: Inefficient sorting with thrust::sort_by_key

After synchronizing the stream to get num_unique_edges_cpu, you're using thrust::sort_by_key which is typically used when you need to sort values based on keys. If you're sorting edges by their properties, consider if thrust::sort would be more appropriate, or document why the key-value sort is necessary.

include/graphics/gl_context_lock.h (2)

63-65: Potential issue with destructor not checking for errors

The destructor calls SDL_GL_MakeCurrent(window, nullptr) without checking for errors. While this is generally acceptable in a destructor, consider logging errors for debugging purposes.

	~gl_context_lock() {
-		SDL_GL_MakeCurrent(window, nullptr);
+		if (SDL_GL_MakeCurrent(window, nullptr) < 0) {
+			// Log error but don't throw in destructor
+			// Consider adding a logging mechanism
+		}
	}

67-70: Move operations should be noexcept

The defaulted move constructor and assignment operator should be marked noexcept for better performance and exception safety.

	gl_context_lock(const gl_context_lock&)            = delete;
-	gl_context_lock(gl_context_lock&&)                = default;
+	gl_context_lock(gl_context_lock&&) noexcept       = default;
	gl_context_lock& operator=(const gl_context_lock&) = delete;
-	gl_context_lock& operator=(gl_context_lock&&)     = default;
+	gl_context_lock& operator=(gl_context_lock&&) noexcept = default;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 48f6dd4 and c681030.

⛔ Files ignored due to path filters (1)
  • src/shaders/objects/vertex_shader.glsl is excluded by !**/*.glsl
📒 Files selected for processing (22)
  • CMakeLists.txt (2 hunks)
  • include/core/game.h (1 hunks)
  • include/cuda_types/buffer.h (1 hunks)
  • include/cuda_types/cuda_from_gl_data.h (1 hunks)
  • include/game_states/playing_state.h (1 hunks)
  • include/graphics/gl_context_lock.h (1 hunks)
  • include/rendering/renderer.h (1 hunks)
  • include/sphere_generation/fwd.h (3 hunks)
  • include/sphere_generation/kernel_launcher.h (1 hunks)
  • include/sphere_generation/sphere_generator.h (1 hunks)
  • include/sphere_generation/tessellation_kernel.h (1 hunks)
  • include/window/fwd.h (1 hunks)
  • include/window/gl_window.h (2 hunks)
  • src/core/game.cpp (1 hunks)
  • src/game_states/playing_state.cpp (4 hunks)
  • src/main.cpp (0 hunks)
  • src/rendering/renderer.cpp (1 hunks)
  • src/sphere_generation/icosahedron_data_manager.cpp (3 hunks)
  • src/sphere_generation/launch_tessellation.cu (3 hunks)
  • src/sphere_generation/sphere_generator.cpp (1 hunks)
  • src/sphere_generation/tessellation_kernel.cu (2 hunks)
  • src/window/gl_window.cpp (1 hunks)
💤 Files with no reviewable changes (1)
  • src/main.cpp
🧰 Additional context used
🧬 Code graph analysis (17)
include/sphere_generation/tessellation_kernel.h (1)
src/sphere_generation/tessellation_kernel.cu (9)
  • void (11-14)
  • void (15-31)
  • void (33-72)
  • void (92-142)
  • void (143-166)
  • void (170-174)
  • void (178-281)
  • void (283-308)
  • calculate_tbn_and_uv (143-143)
include/sphere_generation/sphere_generator.h (3)
include/window/fwd.h (1)
  • raw (11-32)
include/cuda_types/stream.h (2)
  • raw (10-30)
  • cuda_stream (11-29)
src/sphere_generation/sphere_generator.cpp (2)
  • generate (10-28)
  • generate (10-12)
include/core/game.h (1)
include/core/game_state.h (1)
  • game_state (12-22)
include/game_states/playing_state.h (1)
src/game_states/playing_state.cpp (2)
  • playing_state (59-81)
  • playing_state (198-200)
include/rendering/renderer.h (2)
src/rendering/renderer.cpp (2)
  • render (22-43)
  • render (22-22)
include/core/camera/camera.h (1)
  • camera (44-93)
include/window/gl_window.h (3)
include/game_states/playing_state.h (1)
  • raw (21-70)
include/window/fwd.h (1)
  • raw (11-32)
src/window/gl_window.cpp (4)
  • gl_window (11-52)
  • gl_window (54-59)
  • get_data (65-67)
  • get_data (65-65)
include/cuda_types/cuda_from_gl_data.h (1)
src/cuda_types/resource.cpp (2)
  • unmap (13-28)
  • unmap (13-13)
include/cuda_types/buffer.h (2)
src/cuda_types/stream.cpp (3)
  • cuda_stream (11-14)
  • cuda_stream (32-36)
  • cuda_stream (51-54)
include/cuda_types/stream.h (1)
  • cuda_stream (11-29)
src/sphere_generation/sphere_generator.cpp (2)
src/cuda_types/stream.cpp (4)
  • sync (28-30)
  • sync (28-28)
  • stream (47-49)
  • stream (47-47)
include/cuda_types/cuda_from_gl_data.h (1)
  • stream (30-41)
include/graphics/gl_context_lock.h (5)
include/game_states/playing_state.h (1)
  • raw (21-70)
include/sphere_generation/sphere_generator.h (1)
  • raw (12-21)
include/window/fwd.h (1)
  • raw (11-32)
include/window/gl_window.h (1)
  • raw (15-62)
src/sphere_generation/sphere_generator.cpp (1)
  • lock (23-23)
include/sphere_generation/kernel_launcher.h (3)
include/sphere_generation/sphere_generator.h (1)
  • raw (12-21)
include/cuda_types/stream.h (1)
  • raw (10-30)
src/sphere_generation/launch_tessellation.cu (2)
  • launch_tessellation (22-100)
  • launch_tessellation (22-27)
src/rendering/renderer.cpp (1)
include/rendering/renderer.h (1)
  • renderer (15-27)
src/window/gl_window.cpp (2)
include/window/gl_window.h (1)
  • gl_window (16-61)
include/window/gl_loader.h (1)
  • init_glad (14-24)
src/sphere_generation/icosahedron_data_manager.cpp (1)
include/cuda_types/cuda_from_gl_data.h (1)
  • stream (30-41)
src/game_states/playing_state.cpp (1)
include/game_states/playing_state.h (1)
  • playing_state (22-68)
src/sphere_generation/launch_tessellation.cu (3)
include/cuda_types/cuda_from_gl_data.h (1)
  • stream (30-41)
src/cuda_types/stream.cpp (2)
  • stream (47-49)
  • stream (47-47)
src/sphere_generation/tessellation_kernel.cu (4)
  • generate_edges (15-15)
  • create_unique_midpoint_vertices (33-36)
  • create_triangles (92-94)
  • calculate_tbn_and_uv (143-143)
src/core/game.cpp (2)
include/core/game.h (1)
  • game (16-29)
include/rendering/renderer.h (1)
  • renderer (15-27)
🔇 Additional comments (23)
CMakeLists.txt (2)

7-7: Good call: no static CUDA libs

Disabling static CUDA library creation matches the executable setup.


65-65: No CUDA driver API usage found; CUDA::cudart suffices.

include/core/game.h (1)

19-19: Minor template cleanup

>> formatting fix is fine.

include/window/gl_window.h (1)

20-20: No change needed: graphics::graphics_data correctly resolves
Unqualified lookup inside raw::window finds the raw::graphics namespace, so graphics::graphics_data already refers to raw::graphics::graphics_data.

include/sphere_generation/fwd.h (1)

9-10: LGTM: pulling in common/fwd.h for UI et al.

This resolves downstream typedef visibility and is appropriate for both host and device builds.

src/sphere_generation/tessellation_kernel.cu (1)

22-27: LGTM: localizing triangle indices and edge_ptr.

No functional change; improves readability and register locality.

include/sphere_generation/kernel_launcher.h (1)

7-7: LGTM: include cuda_runtime.h here.

Needed for cudaStream_t in the declaration; avoids transitive include pitfalls.

include/sphere_generation/sphere_generator.h (1)

18-18: API change looks good

generate(...) now threading with GL-context locking; signature aligns with usage in playing_state.

src/rendering/renderer.cpp (1)

22-25: Good addition: RAII GL context lock around render.

Locking MAIN before any GL calls is correct and keeps render thread-safe with multi-context.

src/game_states/playing_state.cpp (3)

59-76: Constructor changes look consistent with new graphics_data flow.

Passing graphics_data through here aligns with the new context-locking model.


78-78: Verify sphere_gen.generate locks the correct GL context internally.

Given you pass graphics_data, ensure generate acquires the intended context (e.g., a worker/upload context) and that resources are shared with MAIN. If not, acquire the specific lock here.


96-99: Simulation step commented out—intentional?

If the sim is meant to run when sim_state.running is true, gate it rather than disabling wholesale.

Apply this (if intended):

-    // system.update_sim();
+    if (sim_state.running) {
+        system.update_sim();
+    }
src/core/game.cpp (2)

9-12: OK: pass graphics_data and window_size into playing_state.

Matches the new constructor signature.


14-35: Confirm delta_time.to_milli() semantics.

If to_milli() returns a value (and does not mutate in-place), this call is a no-op. If it mutates, all good. Please confirm and either remove it or capture the return.

src/sphere_generation/sphere_generator.cpp (2)

10-12: LGTM! Clean API extension for graphics context management.

The addition of the graphics_data parameter properly supports the new multi-context GL architecture.


20-27: GL context lock correctly replaces CUDA device setup.

The change from per-thread CUDA device initialization to GL context locking is appropriate for the multi-context architecture. The TESS context type properly isolates tessellation operations.

src/window/gl_window.cpp (1)

54-59: Destructor correctly cleans up all GL contexts.

The cleanup order is appropriate - contexts are destroyed before the window.

src/sphere_generation/icosahedron_data_manager.cpp (3)

48-53: 3x buffer allocation increase needs validation.

All edge-related buffers now allocate 3x the previous capacity. This appears to account for the maximum number of edges in a fully tessellated mesh (3 edges per triangle).

The allocation strategy is sound - each triangle can have up to 3 edges, so allocating MAXIMUM_AMOUNT_OF_TRIANGLES * 3 provides sufficient space for the worst case.


73-75: Buffer allocations in prepare() are consistent with init().

The 3x allocation pattern is correctly maintained in the prepare function.


92-94: Cleanup properly frees the newly allocated buffers.

All dynamically allocated buffers are correctly freed in the cleanup method.

src/sphere_generation/launch_tessellation.cu (2)

98-99: Good addition of stream synchronization

The addition of cudaStreamSynchronize after calculate_tbn_and_uv ensures the kernel completes before proceeding. This prevents potential race conditions.


69-71: Memory allocation for num_unique_edges_cpu is correct: it’s constructed with sizeof(uint32_t), matching the size passed to cudaMemcpyAsync.

include/graphics/gl_context_lock.h (1)

48-62: Good use of C++20 concepts with requires clauses

The use of requires clauses to constrain the template based on context_type provides compile-time safety and clear API design. This ensures each constructor can only be instantiated with the correct context type.

Comment on lines +46 to +50
__device__ __host__ T &operator*()
requires(Side == side::host)
{
return *ptr;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Misannotated operator: host-only op is marked device—can trigger illegal device deref*

operator* is intended for host-side buffers but is compiled for device as well due to device. Remove the device attribute.

-	__device__ __host__ T &operator*()
+	__host__ T &operator*()
 		requires(Side == side::host)
 	{
 		return *ptr;
 	}

Optional: add a const overload.

+	__host__ const T &operator*() const
+		requires(Side == side::host)
+	{
+		return *ptr;
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
__device__ __host__ T &operator*()
requires(Side == side::host)
{
return *ptr;
}
__host__ T &operator*()
requires(Side == side::host)
{
return *ptr;
}
__host__ const T &operator*() const
requires(Side == side::host)
{
return *ptr;
}
🤖 Prompt for AI Agents
In include/cuda_types/buffer.h around lines 46-50, the operator* is incorrectly
annotated with __device__ which causes host-only dereference to be compiled for
device and can trigger illegal device derefs; remove the __device__ attribute so
the operator is host-only (keep the requires(Side == side::host) constraint),
and add a const overload returning const T& with the same host-only guarantee.

Comment on lines +40 to 41
unmap();
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Bug: unmapping in ctor leaves data invalid; map() never refreshes pointer

After unmap(), data points to unmapped memory. Also, map() doesn’t call cudaGraphicsResourceGetMappedPointer, so data is never revalidated. This explains Thrust misuse/UB.

Minimal, safe fix: store size, null data on unmap, and reacquire pointer on map. Also delete copy ops, add move ctor.

 class cuda_from_gl_data {
 private:
 	cudaGraphicsResource_t		 cuda_resource = nullptr;
 	T*							 data		   = nullptr;
 	bool						 mapped		   = false;
 	std::shared_ptr<cuda_stream> stream;
+	size_t                       bytes_        = 0;

 public:
 	cuda_from_gl_data() = default;

 	cuda_from_gl_data(size_t* amount_of_bytes, UI buffer_object,
 					  std::shared_ptr<cuda_stream> stream = nullptr)
 		: stream(stream) {
@@
-		CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &cuda_resource, stream->stream()));
-		CUDA_SAFE_CALL(
-			cudaGraphicsResourceGetMappedPointer((void**)&data, amount_of_bytes, cuda_resource));
-		mapped = true;
-		unmap();
+		// Optionally discover size once; keep resource initially unmapped.
+		CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &cuda_resource, stream->stream()));
+		CUDA_SAFE_CALL(cudaGraphicsResourceGetMappedPointer((void**)&data, &bytes_, cuda_resource));
+		if (amount_of_bytes) *amount_of_bytes = bytes_;
+		CUDA_SAFE_CALL(cudaGraphicsUnmapResources(1, &cuda_resource, stream->stream()));
+		data   = nullptr;
+		mapped = false;
 	};
+	cuda_from_gl_data(cuda_from_gl_data&& rhs) noexcept {
+		*this = std::move(rhs);
+	}
+	cuda_from_gl_data(const cuda_from_gl_data&)            = delete;
+	cuda_from_gl_data& operator=(const cuda_from_gl_data&) = delete;
 	cuda_from_gl_data& operator=(cuda_from_gl_data&& rhs) noexcept {
 		mapped			  = rhs.mapped;
 		rhs.mapped		  = false;
 		cuda_resource	  = rhs.cuda_resource;
 		data			  = rhs.data;
 		rhs.cuda_resource = nullptr;
 		rhs.data		  = nullptr;
+		bytes_            = rhs.bytes_;
+		rhs.bytes_        = 0;
+		stream            = std::move(rhs.stream);
 		return *this;
 	}
@@
 	void unmap() {
-		if (mapped)
-			CUDA_SAFE_CALL(cudaGraphicsUnmapResources(1, &cuda_resource, nullptr));
-		mapped = false;
+		if (mapped) {
+			CUDA_SAFE_CALL(cudaGraphicsUnmapResources(1, &cuda_resource, stream ? stream->stream() : 0));
+			mapped = false;
+			data   = nullptr;
+		}
 	}
 	void map() {
-		if (!mapped)
-			CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &cuda_resource, nullptr));
-		mapped = true;
+		if (!mapped) {
+			CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &cuda_resource, stream ? stream->stream() : 0));
+			CUDA_SAFE_CALL(cudaGraphicsResourceGetMappedPointer((void**)&data, &bytes_, cuda_resource));
+			mapped = true;
+		}
 	}

Follow-up: audit all call sites to not use get_data() unless mapped.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
unmap();
};
class cuda_from_gl_data {
private:
cudaGraphicsResource_t cuda_resource = nullptr;
T* data = nullptr;
bool mapped = false;
std::shared_ptr<cuda_stream> stream;
size_t bytes_ = 0;
public:
cuda_from_gl_data() = default;
cuda_from_gl_data(size_t* amount_of_bytes, UI buffer_object,
std::shared_ptr<cuda_stream> stream = nullptr)
: stream(stream) {
// Optionally discover size once; keep resource initially unmapped.
CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &cuda_resource, stream->stream()));
CUDA_SAFE_CALL(cudaGraphicsResourceGetMappedPointer((void**)&data, &bytes_, cuda_resource));
if (amount_of_bytes) *amount_of_bytes = bytes_;
CUDA_SAFE_CALL(cudaGraphicsUnmapResources(1, &cuda_resource, stream->stream()));
data = nullptr;
mapped = false;
}
cuda_from_gl_data(cuda_from_gl_data&& rhs) noexcept {
*this = std::move(rhs);
}
cuda_from_gl_data(const cuda_from_gl_data&) = delete;
cuda_from_gl_data& operator=(const cuda_from_gl_data&) = delete;
cuda_from_gl_data& operator=(cuda_from_gl_data&& rhs) noexcept {
mapped = rhs.mapped;
rhs.mapped = false;
cuda_resource = rhs.cuda_resource;
data = rhs.data;
rhs.cuda_resource = nullptr;
rhs.data = nullptr;
bytes_ = rhs.bytes_;
rhs.bytes_ = 0;
stream = std::move(rhs.stream);
return *this;
}
void unmap() {
if (mapped) {
CUDA_SAFE_CALL(cudaGraphicsUnmapResources(
1, &cuda_resource, stream ? stream->stream() : 0));
mapped = false;
data = nullptr;
}
}
void map() {
if (!mapped) {
CUDA_SAFE_CALL(cudaGraphicsMapResources(
1, &cuda_resource, stream ? stream->stream() : 0));
CUDA_SAFE_CALL(cudaGraphicsResourceGetMappedPointer(
(void**)&data, &bytes_, cuda_resource));
mapped = true;
}
}
};

Comment on lines +16 to +24
struct graphics_data {
SDL_Window* window;
std::mutex main_mutex;
SDL_GLContext main_context;
std::mutex tessellation_mutex;
SDL_GLContext tessellation_context;
std::mutex texture_gen_mutex;
SDL_GLContext texture_gen_context;
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding validation for SDL contexts

The graphics_data struct holds raw SDL contexts without any validation. Consider adding checks to ensure contexts are properly initialized before use.

Consider adding a method to validate contexts:

bool is_valid() const {
    return window != nullptr && 
           main_context != nullptr && 
           tessellation_context != nullptr && 
           texture_gen_context != nullptr;
}
🤖 Prompt for AI Agents
In include/graphics/gl_context_lock.h around lines 16 to 24, the graphics_data
struct stores raw SDL pointers/contexts without any validation; add a const
member function is_valid() that checks window and each SDL_GLContext field are
non-nullptr and return a bool, and call or assert this validation wherever
graphics_data is constructed or accessed to prevent use of uninitialized
contexts.

Comment on lines +31 to +45
void set_current_context(graphics_data& data) const {
using enum context_type;
bool result = false;
if constexpr (ctx_type == MAIN) {
result = SDL_GL_MakeCurrent(window, data.main_context);
} else if constexpr (ctx_type == TESS) {
result = SDL_GL_MakeCurrent(window, data.tessellation_context);
} else if constexpr (ctx_type == TEX_GEN) {
result = SDL_GL_MakeCurrent(window, data.texture_gen_context);
}
if (result == false) {
throw std::runtime_error(std::format(
"Failed to Set Current Context, IDK what to do, bye bye! {}\n", SDL_GetError()));
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

SDL_GL_MakeCurrent returns int, not bool

SDL_GL_MakeCurrent returns 0 on success and negative on error, not a boolean. The current comparison will throw an error on success.

	void set_current_context(graphics_data& data) const {
		using enum context_type;
-		bool result = false;
+		int result = -1;
		if constexpr (ctx_type == MAIN) {
			result = SDL_GL_MakeCurrent(window, data.main_context);
		} else if constexpr (ctx_type == TESS) {
			result = SDL_GL_MakeCurrent(window, data.tessellation_context);
		} else if constexpr (ctx_type == TEX_GEN) {
			result = SDL_GL_MakeCurrent(window, data.texture_gen_context);
		}
-		if (result == false) {
+		if (result < 0) {
			throw std::runtime_error(std::format(
				"Failed to Set Current Context, IDK what to do, bye bye! {}\n", SDL_GetError()));
		}
	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void set_current_context(graphics_data& data) const {
using enum context_type;
bool result = false;
if constexpr (ctx_type == MAIN) {
result = SDL_GL_MakeCurrent(window, data.main_context);
} else if constexpr (ctx_type == TESS) {
result = SDL_GL_MakeCurrent(window, data.tessellation_context);
} else if constexpr (ctx_type == TEX_GEN) {
result = SDL_GL_MakeCurrent(window, data.texture_gen_context);
}
if (result == false) {
throw std::runtime_error(std::format(
"Failed to Set Current Context, IDK what to do, bye bye! {}\n", SDL_GetError()));
}
}
void set_current_context(graphics_data& data) const {
using enum context_type;
int result = -1;
if constexpr (ctx_type == MAIN) {
result = SDL_GL_MakeCurrent(window, data.main_context);
} else if constexpr (ctx_type == TESS) {
result = SDL_GL_MakeCurrent(window, data.tessellation_context);
} else if constexpr (ctx_type == TEX_GEN) {
result = SDL_GL_MakeCurrent(window, data.texture_gen_context);
}
if (result < 0) {
throw std::runtime_error(std::format(
"Failed to Set Current Context, IDK what to do, bye bye! {}\n", SDL_GetError()));
}
}
🤖 Prompt for AI Agents
In include/graphics/gl_context_lock.h around lines 31 to 45, SDL_GL_MakeCurrent
is treated as returning bool but actually returns int (0 on success, negative on
error); change the result variable to int, assign the return value directly, and
treat any non‑zero/negative return as failure (e.g., if (result != 0) or if
(result < 0)); on failure throw the runtime_error with SDL_GetError() included
in the message and use a clearer error string.

Comment on lines 28 to 32
static const UI MAXIMUM_AMOUNT_OF_INDICES =
BASIC_AMOUNT_OF_TRIANGLES * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2;
static const UI MAXIMUM_AMOUNT_OF_VERTICES = 10 * static_cast<UI>(std::pow(4, MAX_STEPS));
static const UI MAXIMUM_AMOUNT_OF_VERTICES = 10 * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2;
static const UI MAXIMUM_AMOUNT_OF_TRIANGLES = BASIC_AMOUNT_OF_TRIANGLES * std::pow(4, MAX_STEPS);
} // namespace predef
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

MAXIMUM_AMOUNT_OF_INDICES is wrong; should be triangles*3 (not +2). Also prefer integer pow to avoid double->int casts.

Incorrect upper bound risks buffer overruns or silent truncation. Use 4^N = 1u << (2u*N) and make these constexpr.

Apply:

- static const UI MAXIMUM_AMOUNT_OF_INDICES =
- 	BASIC_AMOUNT_OF_TRIANGLES * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2;
-static const UI MAXIMUM_AMOUNT_OF_VERTICES	= 10 * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2;
-static const UI MAXIMUM_AMOUNT_OF_TRIANGLES = BASIC_AMOUNT_OF_TRIANGLES * std::pow(4, MAX_STEPS);
+ static constexpr UI MAXIMUM_AMOUNT_OF_INDICES   =
+   BASIC_AMOUNT_OF_TRIANGLES * (1u << (2u * MAX_STEPS)) * 3u;
+ static constexpr UI MAXIMUM_AMOUNT_OF_VERTICES  =
+   10u * (1u << (2u * MAX_STEPS)) + 2u;
+ static constexpr UI MAXIMUM_AMOUNT_OF_TRIANGLES =
+   BASIC_AMOUNT_OF_TRIANGLES * (1u << (2u * MAX_STEPS));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static const UI MAXIMUM_AMOUNT_OF_INDICES =
BASIC_AMOUNT_OF_TRIANGLES * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2;
static const UI MAXIMUM_AMOUNT_OF_VERTICES = 10 * static_cast<UI>(std::pow(4, MAX_STEPS));
static const UI MAXIMUM_AMOUNT_OF_VERTICES = 10 * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2;
static const UI MAXIMUM_AMOUNT_OF_TRIANGLES = BASIC_AMOUNT_OF_TRIANGLES * std::pow(4, MAX_STEPS);
} // namespace predef
static constexpr UI MAXIMUM_AMOUNT_OF_INDICES =
BASIC_AMOUNT_OF_TRIANGLES * (1u << (2u * MAX_STEPS)) * 3u;
static constexpr UI MAXIMUM_AMOUNT_OF_VERTICES =
10u * (1u << (2u * MAX_STEPS)) + 2u;
static constexpr UI MAXIMUM_AMOUNT_OF_TRIANGLES =
BASIC_AMOUNT_OF_TRIANGLES * (1u << (2u * MAX_STEPS));
} // namespace predef
🤖 Prompt for AI Agents
In include/sphere_generation/fwd.h around lines 28–32, replace the current
static consts that use std::pow and incorrect indices formula: make them
constexpr, compute the power of 4 with an integer shift (4^N == 1u << (2u *
MAX_STEPS)) to avoid double→int casts, set MAXIMUM_AMOUNT_OF_TRIANGLES using
that integer power, set MAXIMUM_AMOUNT_OF_INDICES = MAXIMUM_AMOUNT_OF_TRIANGLES
* 3 (no +2), and compute MAXIMUM_AMOUNT_OF_VERTICES similarly using the integer
power and correct type UI.

Comment on lines +58 to +61
cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice);
cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()),
num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Host-to-device copies lack error handling.

The cudaMemcpy operations don't check for errors, which could lead to silent failures.

Wrap the memcpy operations with error checking:

-cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
-           num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice);
-cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()),
-           num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice);
+CUDA_SAFE_CALL(cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
+                          num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice));
+CUDA_SAFE_CALL(cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()),
+                          num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice);
cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()),
num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice);
CUDA_SAFE_CALL(cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice));
CUDA_SAFE_CALL(cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()),
num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice));
🤖 Prompt for AI Agents
In src/sphere_generation/icosahedron_data_manager.cpp around lines 58-61, the
two cudaMemcpy host-to-device calls are un-checked; capture each call's
cudaError_t return value and verify it equals cudaSuccess. For each memcpy,
assign the result to a variable (e.g. err = cudaMemcpy(...)), and if err !=
cudaSuccess log or propagate the failure (use cudaGetErrorString(err) in the log
message) and return/throw/exit appropriately so failures are not silent; do this
for both the vertices and indices copies.

💡 Verification agent

🧩 Analysis chain

Verify vertex data size mismatch.

The vertices are of type graphics::vertex but you're copying glm::vec3 data. This size mismatch could cause buffer overruns.

The copy operation uses sizeof(glm::vec3) but the buffer is allocated for graphics::vertex type. According to the code, graphics::vertex contains position, normal, tangent, bitangent, and tex_coord fields, which is much larger than just glm::vec3. This will cause incomplete initialization of the vertex data.

Apply this fix:

-cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
-           num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice);
+cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
+           num_vertices_cpu * sizeof(graphics::vertex), cudaMemcpyHostToDevice);

🏁 Script executed:

#!/bin/bash
# Check the definition of graphics::vertex to understand its structure
ast-grep --pattern 'struct vertex {
  $$$
}'

# Also check for graphics::vertex in namespace
ast-grep --pattern 'namespace graphics {
  $$$
  struct vertex {
    $$$
  }
  $$$
}'

Length of output: 134


🏁 Script executed:

#!/bin/bash
# Search for generate_icosahedron_vertices definition and its return type
rg -n "generate_icosahedron_vertices" -C3 --type cpp

# Search for the definition of graphics::vertex
rg -n "struct vertex" -C3 --type cpp

# Search for the namespace graphics block
rg -n "namespace graphics" -C3 --type cpp

Length of output: 3681


🏁 Script executed:

#!/bin/bash
# Display the full definition of raw::graphics::vertex to inspect all fields
sed -n '1,200p' include/graphics/vertex.h

Length of output: 427


🏁 Script executed:

#!/bin/bash
# Inspect how vertices_handle is allocated and used
rg -n "vertices_handle" -C3 --type cpp,h

Length of output: 70


Correct vertex copy size
The cudaMemcpy and cudaMemcpyAsync calls only copy sizeof(glm::vec3) into buffers of raw::graphics::vertex (which include position, tex_coord, normal, tangent, bitangent), leading to incomplete initialization and potential overruns. Change both to use sizeof(raw::graphics::vertex):

--- src/sphere_generation/icosahedron_data_manager.cpp
@@ -58,7 +58,7 @@
- cudaMemcpy(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()),
-            num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice);
+ cudaMemcpy(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()),
+            num_vertices_cpu * sizeof(raw::graphics::vertex), cudaMemcpyHostToDevice);

@@ -78,7 +78,7 @@
- cudaMemcpyAsync(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()),
-                 num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice, stream->stream());
+ cudaMemcpyAsync(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()),
+                 num_vertices_cpu * sizeof(raw::graphics::vertex), cudaMemcpyHostToDevice, stream->stream());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice);
cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()),
num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice);
// Synchronous copy: use full vertex struct size
cudaMemcpy(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()),
num_vertices_cpu * sizeof(raw::graphics::vertex), cudaMemcpyHostToDevice);
cudaMemcpy(indices_handle.get_data(), std::data(generate_icosahedron_indices()),
num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice);
// Asynchronous copy: also use full vertex struct size
cudaMemcpyAsync(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()),
num_vertices_cpu * sizeof(raw::graphics::vertex), cudaMemcpyHostToDevice, stream->stream());
🤖 Prompt for AI Agents
In src/sphere_generation/icosahedron_data_manager.cpp around lines 58 to 61, the
cudaMemcpy that copies vertex data currently uses sizeof(glm::vec3) which only
accounts for position and not the full raw::graphics::vertex structure; update
the copy size to num_vertices_cpu * sizeof(raw::graphics::vertex) (and replace
sizeof(glm::vec3) with sizeof(raw::graphics::vertex)). Also audit and change any
corresponding cudaMemcpyAsync calls to use sizeof(raw::graphics::vertex) so the
device buffer receives the full vertex (position, tex_coord, normal, tangent,
bitangent) and avoid partial initialization/overruns; leave the indices copy
calculation as-is using UI.

Comment on lines +33 to +37
cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_triangles_cpu(sizeof(uint32_t));
cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_unique_edges_cpu(
sizeof(uint32_t));
*num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES;
auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use stack allocation for single values instead of heap buffers

Using cuda_types::cuda_buffer for single uint32_t values adds unnecessary overhead. Consider using stack variables instead.

-	cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_triangles_cpu(sizeof(uint32_t));
-	cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_unique_edges_cpu(
-		sizeof(uint32_t));
-	*num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES;
+	uint32_t num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES;
+	uint32_t num_unique_edges_cpu = 0;

Then update all usages to remove the dereference operator:

  • *num_triangles_cpunum_triangles_cpu
  • *num_unique_edges_cpunum_unique_edges_cpu
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_triangles_cpu(sizeof(uint32_t));
cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_unique_edges_cpu(
sizeof(uint32_t));
*num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES;
auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block;
// Before:
// cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_triangles_cpu(sizeof(uint32_t));
// cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_unique_edges_cpu(
// sizeof(uint32_t));
// *num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES;
// auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block;
// After:
uint32_t num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES;
uint32_t num_unique_edges_cpu = 0;
auto blocks = (num_triangles_cpu + threads_per_block - 1) / threads_per_block;
🤖 Prompt for AI Agents
In src/sphere_generation/launch_tessellation.cu around lines 33 to 37, replace
the heap-allocated single-value cuda_buffer usages with plain stack uint32_t
variables (e.g., uint32_t num_triangles_cpu and uint32_t num_unique_edges_cpu),
assign predef::BASIC_AMOUNT_OF_TRIANGLES directly to num_triangles_cpu, compute
blocks using num_triangles_cpu (no dereference), and update all subsequent
references in this file from *num_triangles_cpu → num_triangles_cpu and
*num_unique_edges_cpu → num_unique_edges_cpu.

Comment on lines +37 to +40
auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block;
int base = 12;
CUDA_SAFE_CALL(
cudaMemcpyAsync(p_vertex_count, &base, sizeof(uint32_t), cudaMemcpyHostToDevice, stream));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential integer overflow in blocks calculation

The calculation (*num_triangles_cpu + threads_per_block - 1) / threads_per_block can overflow if *num_triangles_cpu is close to UINT32_MAX. Consider using a safer calculation or adding an overflow check.

-	auto blocks		   = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block;
+	if (*num_triangles_cpu > UINT32_MAX - threads_per_block + 1) {
+		throw std::overflow_error("Triangle count too large for block calculation");
+	}
+	auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block;
int base = 12;
CUDA_SAFE_CALL(
cudaMemcpyAsync(p_vertex_count, &base, sizeof(uint32_t), cudaMemcpyHostToDevice, stream));
// guard against overflow when rounding up
if (*num_triangles_cpu > UINT32_MAX - threads_per_block + 1) {
throw std::overflow_error("Triangle count too large for block calculation");
}
auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block;
int base = 12;
CUDA_SAFE_CALL(
cudaMemcpyAsync(p_vertex_count, &base, sizeof(uint32_t), cudaMemcpyHostToDevice, stream));
🤖 Prompt for AI Agents
In src/sphere_generation/launch_tessellation.cu around lines 37 to 40, the
expression (*num_triangles_cpu + threads_per_block - 1) / threads_per_block can
overflow when *num_triangles_cpu is near UINT32_MAX; change the calculation to
do the addition in a wider integer (e.g. cast operands to uint64_t or size_t),
compute blocks as (uint64_t(*num_triangles_cpu) + threads_per_block - 1) /
threads_per_block, and then validate or clamp the result to the type used for
kernel launch (or return an error if it exceeds allowable launch limits) to
prevent overflow and undefined behavior.

gl::ATTR(SDL_GL_DEPTH_SIZE, 24);
gl_window::gl_window(const std::string &window_name) {
SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_CORE);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify context sharing behavior for multiple GL contexts.

Setting SDL_GL_SHARE_WITH_CURRENT_CONTEXT to 1 before any context exists may not have the intended effect. This attribute typically needs to be set between context creations to enable sharing between the current context and the next one to be created.

The proper pattern for creating shared contexts should be:

  1. Create main context
  2. Make it current
  3. Set SDL_GL_SHARE_WITH_CURRENT_CONTEXT to 1
  4. Create tessellation context (will share with main)
  5. Create texture_gen context (will share with main)

Consider restructuring the context creation to ensure proper sharing.


🏁 Script executed:

#!/bin/bash
# Check if there are any other locations where SDL_GL_SHARE_WITH_CURRENT_CONTEXT is used
rg -n "SDL_GL_SHARE_WITH_CURRENT_CONTEXT" --type cpp

Length of output: 140


🏁 Script executed:

#!/bin/bash
# Display the first 200 lines of gl_window.cpp to inspect context creation sequence
sed -n '1,200p' src/window/gl_window.cpp

# Locate all uses of SDL_GL_CreateContext in this file to see when contexts are created
rg -n "SDL_GL_CreateContext" -C3 src/window/gl_window.cpp

Length of output: 4247


Set SDL_GL_SHARE_WITH_CURRENT_CONTEXT after making the main context current
Move the SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1) call to after

SDL_GL_MakeCurrent(data.window, data.main_context);

and before creating the tessellation and texture_gen contexts so they actually share lists with the main context.

🤖 Prompt for AI Agents
In src/window/gl_window.cpp around line 13, the
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1) call is placed too
early; move this call to immediately after SDL_GL_MakeCurrent(data.window,
data.main_context) and before creating the tessellation and texture_gen contexts
so those contexts share lists with the main context. Ensure the sequence is:
create main_context, make it current, set SDL_GL_SHARE_WITH_CURRENT_CONTEXT to
1, then create the other contexts (tessellation and texture_gen).

Comment on lines +50 to +51
gl::VIEW(0, 0, 2560, 1440);
SDL_GL_MakeCurrent(data.window, data.main_context);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Hardcoded viewport dimensions should match window creation parameters.

The viewport is set to fixed dimensions (2560x1440) which match the window creation, but this creates a maintenance issue if the resolution needs to change.

Consider using the window dimensions dynamically:

-gl::VIEW(0, 0, 2560, 1440);
+int width, height;
+SDL_GetWindowSize(data.window, &width, &height);
+gl::VIEW(0, 0, width, height);
🤖 Prompt for AI Agents
In src/window/gl_window.cpp around lines 50-51, the viewport is hardcoded to
2560x1440 which can drift from actual window size; replace the fixed dimensions
by querying the window's current size (e.g., call SDL_GetWindowSize or use the
stored width/height) after making the GL context current, then pass those
dynamic width/height values to gl::VIEW (or glViewport) so the viewport always
matches the actual window resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant