Conversation
…e_generation, separated what which class does, and finished new tessellation process, although hitting some nasty bug with thrust
…ake sure they actually produce the goddam sphere
…ces accessible in the worker thread via some shenanigans
WalkthroughAdds multi-context GL management via a new graphics_data and gl_context_lock; updates window to create/share three GL contexts; adjusts renderer and sphere generation to lock appropriate contexts; revises game loop state handling; modifies CUDA buffer behavior and GL interop mapping; refactors tessellation to dynamic counts; updates sphere APIs and kernel declarations; minor build and formatting tweaks. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App
participant Game as core::game
participant State as playing_state
participant Renderer
participant Window as gl_window
Note over App,Renderer: Initialization
App->>Game: construct(renderer)
Game->>State: push playing_state(window_data, window_size)
loop Main loop
Game->>State: handle_input()
alt input requests pop
State-->>Game: true
Game->>Game: pop_state()
opt stack empty
Game-->>App: exit
end
else continue
Game->>State: update(dt)
Game->>Renderer: render(queue, camera)
Renderer->>Window: get_data()
Renderer->>Renderer: gl_context_lock<MAIN>
Renderer->>Renderer: issue GL draws
Renderer-->>Renderer: unlock context
end
end
sequenceDiagram
autonumber
participant Play as playing_state
participant SG as sphere_generator
participant Win as gl_window
participant Tess as gl_context_lock<TESS>
participant Src as icosahedron_data_manager
participant CU as launch_tessellation.cu
participant Kern as CUDA Kernels
Play->>Win: get_data()
Play->>SG: generate(steps, stream, source, data)
SG->>Tess: lock TESS context (SDL_GL_MakeCurrent)
SG->>Src: prepare/create_context()
SG->>CU: launch_tessellation(data_for_thread, steps, stream)
CU->>Kern: generate_edges / unique edges / create_triangles
CU->>Kern: calculate_tbn_and_uv(vertices, count)
CU-->>SG: sync as needed
SG-->>Tess: unlock context
sequenceDiagram
autonumber
participant Renderer
participant Win as gl_window
participant Lock as gl_context_lock<MAIN>
Renderer->>Win: get_data()
Renderer->>Lock: construct (locks MAIN)
Renderer->>Renderer: set shaders, bind meshes
Renderer->>Renderer: draw calls
Renderer-->>Lock: destroy (releases)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Docstrings generation was requested by @NeKon69. * #1 (comment) The following files were modified: * `include/core/game.h` * `include/cuda_types/buffer.h` * `include/cuda_types/cuda_from_gl_data.h` * `include/game_states/playing_state.h` * `include/graphics/gl_context_lock.h` * `include/rendering/renderer.h` * `include/sphere_generation/fwd.h` * `include/sphere_generation/kernel_launcher.h` * `include/sphere_generation/sphere_generator.h` * `include/sphere_generation/tessellation_kernel.h` * `include/window/fwd.h` * `include/window/gl_window.h` * `src/core/game.cpp` * `src/game_states/playing_state.cpp` * `src/main.cpp` * `src/rendering/renderer.cpp` * `src/sphere_generation/icosahedron_data_manager.cpp` * `src/sphere_generation/launch_tessellation.cu` * `src/sphere_generation/sphere_generator.cpp` * `src/sphere_generation/tessellation_kernel.cu` * `src/window/gl_window.cpp`
|
Note Generated docstrings for this pull request at #2 |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
src/sphere_generation/tessellation_kernel.cu (1)
115-123: Potential -1 indexing into edge_to_vertex → undefined behavior. Guard or assert before use.find_edge returns -1 on miss; changing IDs to int is fine, but indexing with a negative value is UB and will corrupt output.
Apply:
- int unique_id_01 = find_edge(unique_edges, num_unique_edges, e01); - int unique_id_12 = find_edge(unique_edges, num_unique_edges, e12); - int unique_id_20 = find_edge(unique_edges, num_unique_edges, e20); + int unique_id_01 = find_edge(unique_edges, num_unique_edges, e01); + int unique_id_12 = find_edge(unique_edges, num_unique_edges, e12); + int unique_id_20 = find_edge(unique_edges, num_unique_edges, e20); + if (unique_id_01 < 0 || unique_id_12 < 0 || unique_id_20 < 0) { + assert(!"create_triangles: edge lookup failed"); + return; // prevents out-of-bounds writes; consider a better recovery if needed + } - uint32_t new_i01 = edge_to_vertex[unique_id_01]; - uint32_t new_i12 = edge_to_vertex[unique_id_12]; - uint32_t new_i20 = edge_to_vertex[unique_id_20]; + uint32_t new_i01 = edge_to_vertex[static_cast<uint32_t>(unique_id_01)]; + uint32_t new_i12 = edge_to_vertex[static_cast<uint32_t>(unique_id_12)]; + uint32_t new_i20 = edge_to_vertex[static_cast<uint32_t>(unique_id_20)];And ensure device asserts are available:
#include <assert.h>include/cuda_types/buffer.h (1)
29-31: Host-side stream removal introduces null deref and misuse of async ops on host memory
- For host buffers, data_stream may be null (explicit ctor), but _memcpy/zero_data unconditionally use data_stream->stream() and async APIs. This will crash and/or use CUDA APIs invalid for host memory.
Use synchronous memcpy for host buffers and std::memset for zeroing; keep async path for device buffers.
- void _memcpy(T *dst, T *src, size_t size, cudaMemcpyKind kind) { - CUDA_SAFE_CALL(cudaMemcpyAsync(dst, src, size, kind, data_stream->stream())); - } + void _memcpy(T *dst, T *src, size_t size, cudaMemcpyKind kind) { + if constexpr (Side == side::device) { + CUDA_SAFE_CALL(cudaMemcpyAsync(dst, src, size, kind, data_stream->stream())); + } else { + // Host-path: synchronous copy is sufficient and avoids stream dependency + CUDA_SAFE_CALL(cudaMemcpy(dst, src, size, kind)); + } + }- explicit cuda_buffer(const size_t size) : _size(size) { - if constexpr (Side == side::device) { - data_stream = std::make_shared<cuda_stream>(); - } - alloc(); - } + explicit cuda_buffer(const size_t size) : _size(size) { + if constexpr (Side == side::device) { + data_stream = std::make_shared<cuda_stream>(); + } + alloc(); + }- void zero_data(size_t amount) { - cudaMemsetAsync(ptr, 0, amount, data_stream->stream()); - } + void zero_data(size_t amount) { + if constexpr (Side == side::device) { + cudaMemsetAsync(ptr, 0, amount, data_stream->stream()); + } else { + std::memset(ptr, 0, amount); + } + }Add required header for std::memset (outside the changed hunk):
#include <cstring>Also applies to: 58-63, 154-156
src/game_states/playing_state.cpp (2)
24-33: Out-of-bounds access: AM_POINT_LIGHTS (5) exceeds light_pos size (4).Loop indexes 0..4 but light_pos has 4 elements, so i==4 dereferences past end → UB/crash. Derive the count from light_pos.
Apply this diff:
-namespace predef { -static constexpr auto AM_POINT_LIGHTS = 5; -} +namespace predef { +// Keep in sync with playing_state::light_pos +static const auto AM_POINT_LIGHTS = playing_state::light_pos.size(); +}
133-143: Debounce the 'O' toggle; pressed_o never set true.Without setting it on key-down, auto-repeat can flip sim state rapidly.
Apply this diff:
- case SDL_SCANCODE_O: - if (pressed_o) - break; + case SDL_SCANCODE_O: + if (pressed_o) + break; + pressed_o = true; if (sim_state.running) { system.pause(); sim_state.running = false; } else { system.start(); sim_state.running = true; } break;src/sphere_generation/launch_tessellation.cu (1)
87-89: Missing error handling for device-to-host copyThe code doesn't check if
p_vertex_countis valid before copying from it. Add null pointer check or ensure it's always initialized.+ if (p_vertex_count == nullptr) { + throw std::runtime_error("p_vertex_count is null"); + } cudaMemcpyAsync(&num_vertices_cpu, p_vertex_count, sizeof(uint32_t), cudaMemcpyDeviceToHost, stream);
🧹 Nitpick comments (17)
include/window/fwd.h (1)
29-31: Forward declare only window types; drop straygraphics_datahereThis declares
raw::window::graphics_data, which differs from the graphics namespace type and risks confusion. Remove it or move a correct forward decl to the graphics header.-struct graphics_data; class sdl_video; class gl_window;include/window/gl_window.h (2)
12-12: Prefer project-include path over relativeUse a stable include path to avoid brittle
..traversal when include dirs change.-#include "../graphics/gl_context_lock.h" +#include "graphics/gl_context_lock.h"
44-45: Accessor signature should match the namespace decisionAlign return type with the chosen qualification; consider const and non-const overloads if mutation isn’t always needed.
- graphics::graphics_data& get_data(); + raw::graphics::graphics_data& get_data(); + const raw::graphics::graphics_data& get_data() const;include/sphere_generation/fwd.h (1)
39-55: Switch to lexicographic operator< on edge is the right fix for Thrust sort.Ordering matches make_canonical_edge and the HOST_DEVICE qualifiers keep it usable on device. Consider noexcept/constexpr for tiny perf/clarity.
Example:
- HOST_DEVICE bool operator<(const edge& other) const { + HOST_DEVICE constexpr bool operator<(const edge& other) const noexcept {include/sphere_generation/kernel_launcher.h (1)
11-15: Typo/mismatch in parameter name: p_edges_cound → p_unique_edges_count (align with implementation and intent).Name doesn’t affect ABI but harms readability and grep-ability alongside launch_tessellation.cu.
- uint32_t *p_edges_cound, cudaStream_t &stream, uint32_t steps); + uint32_t *p_unique_edges_count, cudaStream_t &stream, uint32_t steps);include/sphere_generation/tessellation_kernel.h (1)
27-28: Unify qualifier/style with surrounding declarationsElsewhere in this header you use graphics::vertex (implicitly raw::graphics). Keep it consistent to reduce noise.
Apply:
-extern __global__ void calculate_tbn_and_uv(raw::graphics::vertex *vertices, +extern __global__ void calculate_tbn_and_uv(graphics::vertex *vertices, uint32_t num_input_vertices);include/rendering/renderer.h (1)
26-26: Keep queue read-only and drop trailing whitespacerender doesn’t mutate the queue; make it const&, and fix the dangling space.
- void render(raw::rendering::queue &command_queue, const core::camera::camera &camera) ; + void render(const raw::rendering::queue &command_queue, const core::camera::camera &camera);include/sphere_generation/sphere_generator.h (1)
10-10: Reduce header coupling: avoid heavy include in a public headerThis header only needs the graphics_data type; prefer forward-declaring it (include/window/fwd.h) and move gl_context_lock.h to the .cpp.
-#include "graphics/gl_context_lock.h" +#include "window/fwd.h" // for raw::window::graphics_data forward declAnd in src/sphere_generation/sphere_generator.cpp, include gl_context_lock.h explicitly.
include/cuda_types/buffer.h (1)
46-50: Tiny ergonomics winConsider also providing operator-> for host buffers to mirror pointer semantics.
+ __host__ T* operator->() + requires(Side == side::host) + { + return ptr; + } + __host__ const T* operator->() const + requires(Side == side::host) + { + return ptr; + }src/rendering/renderer.cpp (1)
26-41: Reduce redundant shader state changes.You set projection/view every command, even when consecutive commands use the same shader. Consider batching by shader or tracking the currently bound program to cut glUseProgram/uniform traffic.
src/game_states/playing_state.cpp (1)
188-192: Call sync before building the render queue for clarity/order.It avoids any ambiguity about buffer visibility when the queue is constructed.
Apply this diff:
- auto queue = build_rendering_queue(); - sphere_gen.sync(); + sphere_gen.sync(); + auto queue = build_rendering_queue(); renderer.render(queue, camera);src/window/gl_window.cpp (2)
11-18: Duplicate GL context profile attribute setting.
SDL_GL_CONTEXT_PROFILE_MASKis set twice with the same value (lines 12 and 16). This is redundant.Apply this diff to remove the duplicate:
SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_CORE); SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); gl::ATTR(SDL_GL_MULTISAMPLEBUFFERS, 1); gl::ATTR(SDL_GL_MULTISAMPLESAMPLES, 1024); -gl::ATTR(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_CORE); gl::ATTR(SDL_GL_DOUBLEBUFFER, 1); gl::ATTR(SDL_GL_DEPTH_SIZE, 24);
36-49: Context creation follows a reasonable pattern but could be more robust.The context creation sequence creates three contexts that should share resources. However, the sharing setup might need adjustment as noted above.
Consider this improved structure for proper context sharing:
init_glad(); +SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); data.tessellation_context = SDL_GL_CreateContext(data.window); if (!data.tessellation_context) { throw std::runtime_error("Tessellation OpenGL context could not be created! SDL_Error: " + std::string(SDL_GetError()) + " in " + std::string(__FILE__) + " on " + std::to_string(__LINE__) + " line"); } data.texture_gen_context = SDL_GL_CreateContext(data.window);src/sphere_generation/launch_tessellation.cu (2)
15-21: Unused comparator functionThe
edge_comparatorfunction is defined but never used in this file. If it's intended for future use or external consumption, consider moving it to a header file. Otherwise, remove it to avoid dead code.
72-73: Inefficient sorting with thrust::sort_by_keyAfter synchronizing the stream to get
num_unique_edges_cpu, you're usingthrust::sort_by_keywhich is typically used when you need to sort values based on keys. If you're sorting edges by their properties, consider ifthrust::sortwould be more appropriate, or document why the key-value sort is necessary.include/graphics/gl_context_lock.h (2)
63-65: Potential issue with destructor not checking for errorsThe destructor calls
SDL_GL_MakeCurrent(window, nullptr)without checking for errors. While this is generally acceptable in a destructor, consider logging errors for debugging purposes.~gl_context_lock() { - SDL_GL_MakeCurrent(window, nullptr); + if (SDL_GL_MakeCurrent(window, nullptr) < 0) { + // Log error but don't throw in destructor + // Consider adding a logging mechanism + } }
67-70: Move operations should be noexceptThe defaulted move constructor and assignment operator should be marked
noexceptfor better performance and exception safety.gl_context_lock(const gl_context_lock&) = delete; - gl_context_lock(gl_context_lock&&) = default; + gl_context_lock(gl_context_lock&&) noexcept = default; gl_context_lock& operator=(const gl_context_lock&) = delete; - gl_context_lock& operator=(gl_context_lock&&) = default; + gl_context_lock& operator=(gl_context_lock&&) noexcept = default;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
src/shaders/objects/vertex_shader.glslis excluded by!**/*.glsl
📒 Files selected for processing (22)
CMakeLists.txt(2 hunks)include/core/game.h(1 hunks)include/cuda_types/buffer.h(1 hunks)include/cuda_types/cuda_from_gl_data.h(1 hunks)include/game_states/playing_state.h(1 hunks)include/graphics/gl_context_lock.h(1 hunks)include/rendering/renderer.h(1 hunks)include/sphere_generation/fwd.h(3 hunks)include/sphere_generation/kernel_launcher.h(1 hunks)include/sphere_generation/sphere_generator.h(1 hunks)include/sphere_generation/tessellation_kernel.h(1 hunks)include/window/fwd.h(1 hunks)include/window/gl_window.h(2 hunks)src/core/game.cpp(1 hunks)src/game_states/playing_state.cpp(4 hunks)src/main.cpp(0 hunks)src/rendering/renderer.cpp(1 hunks)src/sphere_generation/icosahedron_data_manager.cpp(3 hunks)src/sphere_generation/launch_tessellation.cu(3 hunks)src/sphere_generation/sphere_generator.cpp(1 hunks)src/sphere_generation/tessellation_kernel.cu(2 hunks)src/window/gl_window.cpp(1 hunks)
💤 Files with no reviewable changes (1)
- src/main.cpp
🧰 Additional context used
🧬 Code graph analysis (17)
include/sphere_generation/tessellation_kernel.h (1)
src/sphere_generation/tessellation_kernel.cu (9)
void(11-14)void(15-31)void(33-72)void(92-142)void(143-166)void(170-174)void(178-281)void(283-308)calculate_tbn_and_uv(143-143)
include/sphere_generation/sphere_generator.h (3)
include/window/fwd.h (1)
raw(11-32)include/cuda_types/stream.h (2)
raw(10-30)cuda_stream(11-29)src/sphere_generation/sphere_generator.cpp (2)
generate(10-28)generate(10-12)
include/core/game.h (1)
include/core/game_state.h (1)
game_state(12-22)
include/game_states/playing_state.h (1)
src/game_states/playing_state.cpp (2)
playing_state(59-81)playing_state(198-200)
include/rendering/renderer.h (2)
src/rendering/renderer.cpp (2)
render(22-43)render(22-22)include/core/camera/camera.h (1)
camera(44-93)
include/window/gl_window.h (3)
include/game_states/playing_state.h (1)
raw(21-70)include/window/fwd.h (1)
raw(11-32)src/window/gl_window.cpp (4)
gl_window(11-52)gl_window(54-59)get_data(65-67)get_data(65-65)
include/cuda_types/cuda_from_gl_data.h (1)
src/cuda_types/resource.cpp (2)
unmap(13-28)unmap(13-13)
include/cuda_types/buffer.h (2)
src/cuda_types/stream.cpp (3)
cuda_stream(11-14)cuda_stream(32-36)cuda_stream(51-54)include/cuda_types/stream.h (1)
cuda_stream(11-29)
src/sphere_generation/sphere_generator.cpp (2)
src/cuda_types/stream.cpp (4)
sync(28-30)sync(28-28)stream(47-49)stream(47-47)include/cuda_types/cuda_from_gl_data.h (1)
stream(30-41)
include/graphics/gl_context_lock.h (5)
include/game_states/playing_state.h (1)
raw(21-70)include/sphere_generation/sphere_generator.h (1)
raw(12-21)include/window/fwd.h (1)
raw(11-32)include/window/gl_window.h (1)
raw(15-62)src/sphere_generation/sphere_generator.cpp (1)
lock(23-23)
include/sphere_generation/kernel_launcher.h (3)
include/sphere_generation/sphere_generator.h (1)
raw(12-21)include/cuda_types/stream.h (1)
raw(10-30)src/sphere_generation/launch_tessellation.cu (2)
launch_tessellation(22-100)launch_tessellation(22-27)
src/rendering/renderer.cpp (1)
include/rendering/renderer.h (1)
renderer(15-27)
src/window/gl_window.cpp (2)
include/window/gl_window.h (1)
gl_window(16-61)include/window/gl_loader.h (1)
init_glad(14-24)
src/sphere_generation/icosahedron_data_manager.cpp (1)
include/cuda_types/cuda_from_gl_data.h (1)
stream(30-41)
src/game_states/playing_state.cpp (1)
include/game_states/playing_state.h (1)
playing_state(22-68)
src/sphere_generation/launch_tessellation.cu (3)
include/cuda_types/cuda_from_gl_data.h (1)
stream(30-41)src/cuda_types/stream.cpp (2)
stream(47-49)stream(47-47)src/sphere_generation/tessellation_kernel.cu (4)
generate_edges(15-15)create_unique_midpoint_vertices(33-36)create_triangles(92-94)calculate_tbn_and_uv(143-143)
src/core/game.cpp (2)
include/core/game.h (1)
game(16-29)include/rendering/renderer.h (1)
renderer(15-27)
🔇 Additional comments (23)
CMakeLists.txt (2)
7-7: Good call: no static CUDA libsDisabling static CUDA library creation matches the executable setup.
65-65: No CUDA driver API usage found; CUDA::cudart suffices.include/core/game.h (1)
19-19: Minor template cleanup
>>formatting fix is fine.include/window/gl_window.h (1)
20-20: No change needed:graphics::graphics_datacorrectly resolves
Unqualified lookup insideraw::windowfinds theraw::graphicsnamespace, sographics::graphics_dataalready refers toraw::graphics::graphics_data.include/sphere_generation/fwd.h (1)
9-10: LGTM: pulling in common/fwd.h for UI et al.This resolves downstream typedef visibility and is appropriate for both host and device builds.
src/sphere_generation/tessellation_kernel.cu (1)
22-27: LGTM: localizing triangle indices and edge_ptr.No functional change; improves readability and register locality.
include/sphere_generation/kernel_launcher.h (1)
7-7: LGTM: include cuda_runtime.h here.Needed for cudaStream_t in the declaration; avoids transitive include pitfalls.
include/sphere_generation/sphere_generator.h (1)
18-18: API change looks goodgenerate(...) now threading with GL-context locking; signature aligns with usage in playing_state.
src/rendering/renderer.cpp (1)
22-25: Good addition: RAII GL context lock around render.Locking MAIN before any GL calls is correct and keeps render thread-safe with multi-context.
src/game_states/playing_state.cpp (3)
59-76: Constructor changes look consistent with new graphics_data flow.Passing graphics_data through here aligns with the new context-locking model.
78-78: Verify sphere_gen.generate locks the correct GL context internally.Given you pass graphics_data, ensure generate acquires the intended context (e.g., a worker/upload context) and that resources are shared with MAIN. If not, acquire the specific lock here.
96-99: Simulation step commented out—intentional?If the sim is meant to run when sim_state.running is true, gate it rather than disabling wholesale.
Apply this (if intended):
- // system.update_sim(); + if (sim_state.running) { + system.update_sim(); + }src/core/game.cpp (2)
9-12: OK: pass graphics_data and window_size into playing_state.Matches the new constructor signature.
14-35: Confirm delta_time.to_milli() semantics.If to_milli() returns a value (and does not mutate in-place), this call is a no-op. If it mutates, all good. Please confirm and either remove it or capture the return.
src/sphere_generation/sphere_generator.cpp (2)
10-12: LGTM! Clean API extension for graphics context management.The addition of the
graphics_dataparameter properly supports the new multi-context GL architecture.
20-27: GL context lock correctly replaces CUDA device setup.The change from per-thread CUDA device initialization to GL context locking is appropriate for the multi-context architecture. The
TESScontext type properly isolates tessellation operations.src/window/gl_window.cpp (1)
54-59: Destructor correctly cleans up all GL contexts.The cleanup order is appropriate - contexts are destroyed before the window.
src/sphere_generation/icosahedron_data_manager.cpp (3)
48-53: 3x buffer allocation increase needs validation.All edge-related buffers now allocate 3x the previous capacity. This appears to account for the maximum number of edges in a fully tessellated mesh (3 edges per triangle).
The allocation strategy is sound - each triangle can have up to 3 edges, so allocating
MAXIMUM_AMOUNT_OF_TRIANGLES * 3provides sufficient space for the worst case.
73-75: Buffer allocations in prepare() are consistent with init().The 3x allocation pattern is correctly maintained in the prepare function.
92-94: Cleanup properly frees the newly allocated buffers.All dynamically allocated buffers are correctly freed in the cleanup method.
src/sphere_generation/launch_tessellation.cu (2)
98-99: Good addition of stream synchronizationThe addition of
cudaStreamSynchronizeaftercalculate_tbn_and_uvensures the kernel completes before proceeding. This prevents potential race conditions.
69-71: Memory allocation for num_unique_edges_cpu is correct: it’s constructed with sizeof(uint32_t), matching the size passed to cudaMemcpyAsync.include/graphics/gl_context_lock.h (1)
48-62: Good use of C++20 concepts with requires clausesThe use of
requiresclauses to constrain the template based oncontext_typeprovides compile-time safety and clear API design. This ensures each constructor can only be instantiated with the correct context type.
| __device__ __host__ T &operator*() | ||
| requires(Side == side::host) | ||
| { | ||
| return *ptr; | ||
| } |
There was a problem hiding this comment.
Misannotated operator: host-only op is marked device—can trigger illegal device deref*
operator* is intended for host-side buffers but is compiled for device as well due to device. Remove the device attribute.
- __device__ __host__ T &operator*()
+ __host__ T &operator*()
requires(Side == side::host)
{
return *ptr;
}Optional: add a const overload.
+ __host__ const T &operator*() const
+ requires(Side == side::host)
+ {
+ return *ptr;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| __device__ __host__ T &operator*() | |
| requires(Side == side::host) | |
| { | |
| return *ptr; | |
| } | |
| __host__ T &operator*() | |
| requires(Side == side::host) | |
| { | |
| return *ptr; | |
| } | |
| __host__ const T &operator*() const | |
| requires(Side == side::host) | |
| { | |
| return *ptr; | |
| } |
🤖 Prompt for AI Agents
In include/cuda_types/buffer.h around lines 46-50, the operator* is incorrectly
annotated with __device__ which causes host-only dereference to be compiled for
device and can trigger illegal device derefs; remove the __device__ attribute so
the operator is host-only (keep the requires(Side == side::host) constraint),
and add a const overload returning const T& with the same host-only guarantee.
| unmap(); | ||
| }; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Bug: unmapping in ctor leaves data invalid; map() never refreshes pointer
After unmap(), data points to unmapped memory. Also, map() doesn’t call cudaGraphicsResourceGetMappedPointer, so data is never revalidated. This explains Thrust misuse/UB.
Minimal, safe fix: store size, null data on unmap, and reacquire pointer on map. Also delete copy ops, add move ctor.
class cuda_from_gl_data {
private:
cudaGraphicsResource_t cuda_resource = nullptr;
T* data = nullptr;
bool mapped = false;
std::shared_ptr<cuda_stream> stream;
+ size_t bytes_ = 0;
public:
cuda_from_gl_data() = default;
cuda_from_gl_data(size_t* amount_of_bytes, UI buffer_object,
std::shared_ptr<cuda_stream> stream = nullptr)
: stream(stream) {
@@
- CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &cuda_resource, stream->stream()));
- CUDA_SAFE_CALL(
- cudaGraphicsResourceGetMappedPointer((void**)&data, amount_of_bytes, cuda_resource));
- mapped = true;
- unmap();
+ // Optionally discover size once; keep resource initially unmapped.
+ CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &cuda_resource, stream->stream()));
+ CUDA_SAFE_CALL(cudaGraphicsResourceGetMappedPointer((void**)&data, &bytes_, cuda_resource));
+ if (amount_of_bytes) *amount_of_bytes = bytes_;
+ CUDA_SAFE_CALL(cudaGraphicsUnmapResources(1, &cuda_resource, stream->stream()));
+ data = nullptr;
+ mapped = false;
};
+ cuda_from_gl_data(cuda_from_gl_data&& rhs) noexcept {
+ *this = std::move(rhs);
+ }
+ cuda_from_gl_data(const cuda_from_gl_data&) = delete;
+ cuda_from_gl_data& operator=(const cuda_from_gl_data&) = delete;
cuda_from_gl_data& operator=(cuda_from_gl_data&& rhs) noexcept {
mapped = rhs.mapped;
rhs.mapped = false;
cuda_resource = rhs.cuda_resource;
data = rhs.data;
rhs.cuda_resource = nullptr;
rhs.data = nullptr;
+ bytes_ = rhs.bytes_;
+ rhs.bytes_ = 0;
+ stream = std::move(rhs.stream);
return *this;
}
@@
void unmap() {
- if (mapped)
- CUDA_SAFE_CALL(cudaGraphicsUnmapResources(1, &cuda_resource, nullptr));
- mapped = false;
+ if (mapped) {
+ CUDA_SAFE_CALL(cudaGraphicsUnmapResources(1, &cuda_resource, stream ? stream->stream() : 0));
+ mapped = false;
+ data = nullptr;
+ }
}
void map() {
- if (!mapped)
- CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &cuda_resource, nullptr));
- mapped = true;
+ if (!mapped) {
+ CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &cuda_resource, stream ? stream->stream() : 0));
+ CUDA_SAFE_CALL(cudaGraphicsResourceGetMappedPointer((void**)&data, &bytes_, cuda_resource));
+ mapped = true;
+ }
}Follow-up: audit all call sites to not use get_data() unless mapped.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| unmap(); | |
| }; | |
| class cuda_from_gl_data { | |
| private: | |
| cudaGraphicsResource_t cuda_resource = nullptr; | |
| T* data = nullptr; | |
| bool mapped = false; | |
| std::shared_ptr<cuda_stream> stream; | |
| size_t bytes_ = 0; | |
| public: | |
| cuda_from_gl_data() = default; | |
| cuda_from_gl_data(size_t* amount_of_bytes, UI buffer_object, | |
| std::shared_ptr<cuda_stream> stream = nullptr) | |
| : stream(stream) { | |
| // Optionally discover size once; keep resource initially unmapped. | |
| CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &cuda_resource, stream->stream())); | |
| CUDA_SAFE_CALL(cudaGraphicsResourceGetMappedPointer((void**)&data, &bytes_, cuda_resource)); | |
| if (amount_of_bytes) *amount_of_bytes = bytes_; | |
| CUDA_SAFE_CALL(cudaGraphicsUnmapResources(1, &cuda_resource, stream->stream())); | |
| data = nullptr; | |
| mapped = false; | |
| } | |
| cuda_from_gl_data(cuda_from_gl_data&& rhs) noexcept { | |
| *this = std::move(rhs); | |
| } | |
| cuda_from_gl_data(const cuda_from_gl_data&) = delete; | |
| cuda_from_gl_data& operator=(const cuda_from_gl_data&) = delete; | |
| cuda_from_gl_data& operator=(cuda_from_gl_data&& rhs) noexcept { | |
| mapped = rhs.mapped; | |
| rhs.mapped = false; | |
| cuda_resource = rhs.cuda_resource; | |
| data = rhs.data; | |
| rhs.cuda_resource = nullptr; | |
| rhs.data = nullptr; | |
| bytes_ = rhs.bytes_; | |
| rhs.bytes_ = 0; | |
| stream = std::move(rhs.stream); | |
| return *this; | |
| } | |
| void unmap() { | |
| if (mapped) { | |
| CUDA_SAFE_CALL(cudaGraphicsUnmapResources( | |
| 1, &cuda_resource, stream ? stream->stream() : 0)); | |
| mapped = false; | |
| data = nullptr; | |
| } | |
| } | |
| void map() { | |
| if (!mapped) { | |
| CUDA_SAFE_CALL(cudaGraphicsMapResources( | |
| 1, &cuda_resource, stream ? stream->stream() : 0)); | |
| CUDA_SAFE_CALL(cudaGraphicsResourceGetMappedPointer( | |
| (void**)&data, &bytes_, cuda_resource)); | |
| mapped = true; | |
| } | |
| } | |
| }; |
| struct graphics_data { | ||
| SDL_Window* window; | ||
| std::mutex main_mutex; | ||
| SDL_GLContext main_context; | ||
| std::mutex tessellation_mutex; | ||
| SDL_GLContext tessellation_context; | ||
| std::mutex texture_gen_mutex; | ||
| SDL_GLContext texture_gen_context; | ||
| }; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider adding validation for SDL contexts
The graphics_data struct holds raw SDL contexts without any validation. Consider adding checks to ensure contexts are properly initialized before use.
Consider adding a method to validate contexts:
bool is_valid() const {
return window != nullptr &&
main_context != nullptr &&
tessellation_context != nullptr &&
texture_gen_context != nullptr;
}🤖 Prompt for AI Agents
In include/graphics/gl_context_lock.h around lines 16 to 24, the graphics_data
struct stores raw SDL pointers/contexts without any validation; add a const
member function is_valid() that checks window and each SDL_GLContext field are
non-nullptr and return a bool, and call or assert this validation wherever
graphics_data is constructed or accessed to prevent use of uninitialized
contexts.
| void set_current_context(graphics_data& data) const { | ||
| using enum context_type; | ||
| bool result = false; | ||
| if constexpr (ctx_type == MAIN) { | ||
| result = SDL_GL_MakeCurrent(window, data.main_context); | ||
| } else if constexpr (ctx_type == TESS) { | ||
| result = SDL_GL_MakeCurrent(window, data.tessellation_context); | ||
| } else if constexpr (ctx_type == TEX_GEN) { | ||
| result = SDL_GL_MakeCurrent(window, data.texture_gen_context); | ||
| } | ||
| if (result == false) { | ||
| throw std::runtime_error(std::format( | ||
| "Failed to Set Current Context, IDK what to do, bye bye! {}\n", SDL_GetError())); | ||
| } | ||
| } |
There was a problem hiding this comment.
SDL_GL_MakeCurrent returns int, not bool
SDL_GL_MakeCurrent returns 0 on success and negative on error, not a boolean. The current comparison will throw an error on success.
void set_current_context(graphics_data& data) const {
using enum context_type;
- bool result = false;
+ int result = -1;
if constexpr (ctx_type == MAIN) {
result = SDL_GL_MakeCurrent(window, data.main_context);
} else if constexpr (ctx_type == TESS) {
result = SDL_GL_MakeCurrent(window, data.tessellation_context);
} else if constexpr (ctx_type == TEX_GEN) {
result = SDL_GL_MakeCurrent(window, data.texture_gen_context);
}
- if (result == false) {
+ if (result < 0) {
throw std::runtime_error(std::format(
"Failed to Set Current Context, IDK what to do, bye bye! {}\n", SDL_GetError()));
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void set_current_context(graphics_data& data) const { | |
| using enum context_type; | |
| bool result = false; | |
| if constexpr (ctx_type == MAIN) { | |
| result = SDL_GL_MakeCurrent(window, data.main_context); | |
| } else if constexpr (ctx_type == TESS) { | |
| result = SDL_GL_MakeCurrent(window, data.tessellation_context); | |
| } else if constexpr (ctx_type == TEX_GEN) { | |
| result = SDL_GL_MakeCurrent(window, data.texture_gen_context); | |
| } | |
| if (result == false) { | |
| throw std::runtime_error(std::format( | |
| "Failed to Set Current Context, IDK what to do, bye bye! {}\n", SDL_GetError())); | |
| } | |
| } | |
| void set_current_context(graphics_data& data) const { | |
| using enum context_type; | |
| int result = -1; | |
| if constexpr (ctx_type == MAIN) { | |
| result = SDL_GL_MakeCurrent(window, data.main_context); | |
| } else if constexpr (ctx_type == TESS) { | |
| result = SDL_GL_MakeCurrent(window, data.tessellation_context); | |
| } else if constexpr (ctx_type == TEX_GEN) { | |
| result = SDL_GL_MakeCurrent(window, data.texture_gen_context); | |
| } | |
| if (result < 0) { | |
| throw std::runtime_error(std::format( | |
| "Failed to Set Current Context, IDK what to do, bye bye! {}\n", SDL_GetError())); | |
| } | |
| } |
🤖 Prompt for AI Agents
In include/graphics/gl_context_lock.h around lines 31 to 45, SDL_GL_MakeCurrent
is treated as returning bool but actually returns int (0 on success, negative on
error); change the result variable to int, assign the return value directly, and
treat any non‑zero/negative return as failure (e.g., if (result != 0) or if
(result < 0)); on failure throw the runtime_error with SDL_GetError() included
in the message and use a clearer error string.
| static const UI MAXIMUM_AMOUNT_OF_INDICES = | ||
| BASIC_AMOUNT_OF_TRIANGLES * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2; | ||
| static const UI MAXIMUM_AMOUNT_OF_VERTICES = 10 * static_cast<UI>(std::pow(4, MAX_STEPS)); | ||
| static const UI MAXIMUM_AMOUNT_OF_VERTICES = 10 * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2; | ||
| static const UI MAXIMUM_AMOUNT_OF_TRIANGLES = BASIC_AMOUNT_OF_TRIANGLES * std::pow(4, MAX_STEPS); | ||
| } // namespace predef |
There was a problem hiding this comment.
MAXIMUM_AMOUNT_OF_INDICES is wrong; should be triangles*3 (not +2). Also prefer integer pow to avoid double->int casts.
Incorrect upper bound risks buffer overruns or silent truncation. Use 4^N = 1u << (2u*N) and make these constexpr.
Apply:
- static const UI MAXIMUM_AMOUNT_OF_INDICES =
- BASIC_AMOUNT_OF_TRIANGLES * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2;
-static const UI MAXIMUM_AMOUNT_OF_VERTICES = 10 * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2;
-static const UI MAXIMUM_AMOUNT_OF_TRIANGLES = BASIC_AMOUNT_OF_TRIANGLES * std::pow(4, MAX_STEPS);
+ static constexpr UI MAXIMUM_AMOUNT_OF_INDICES =
+ BASIC_AMOUNT_OF_TRIANGLES * (1u << (2u * MAX_STEPS)) * 3u;
+ static constexpr UI MAXIMUM_AMOUNT_OF_VERTICES =
+ 10u * (1u << (2u * MAX_STEPS)) + 2u;
+ static constexpr UI MAXIMUM_AMOUNT_OF_TRIANGLES =
+ BASIC_AMOUNT_OF_TRIANGLES * (1u << (2u * MAX_STEPS));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static const UI MAXIMUM_AMOUNT_OF_INDICES = | |
| BASIC_AMOUNT_OF_TRIANGLES * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2; | |
| static const UI MAXIMUM_AMOUNT_OF_VERTICES = 10 * static_cast<UI>(std::pow(4, MAX_STEPS)); | |
| static const UI MAXIMUM_AMOUNT_OF_VERTICES = 10 * static_cast<UI>(std::pow(4, MAX_STEPS)) + 2; | |
| static const UI MAXIMUM_AMOUNT_OF_TRIANGLES = BASIC_AMOUNT_OF_TRIANGLES * std::pow(4, MAX_STEPS); | |
| } // namespace predef | |
| static constexpr UI MAXIMUM_AMOUNT_OF_INDICES = | |
| BASIC_AMOUNT_OF_TRIANGLES * (1u << (2u * MAX_STEPS)) * 3u; | |
| static constexpr UI MAXIMUM_AMOUNT_OF_VERTICES = | |
| 10u * (1u << (2u * MAX_STEPS)) + 2u; | |
| static constexpr UI MAXIMUM_AMOUNT_OF_TRIANGLES = | |
| BASIC_AMOUNT_OF_TRIANGLES * (1u << (2u * MAX_STEPS)); | |
| } // namespace predef |
🤖 Prompt for AI Agents
In include/sphere_generation/fwd.h around lines 28–32, replace the current
static consts that use std::pow and incorrect indices formula: make them
constexpr, compute the power of 4 with an integer shift (4^N == 1u << (2u *
MAX_STEPS)) to avoid double→int casts, set MAXIMUM_AMOUNT_OF_TRIANGLES using
that integer power, set MAXIMUM_AMOUNT_OF_INDICES = MAXIMUM_AMOUNT_OF_TRIANGLES
* 3 (no +2), and compute MAXIMUM_AMOUNT_OF_VERTICES similarly using the integer
power and correct type UI.
| cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()), | ||
| num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice); | ||
| cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()), | ||
| num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice); |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Host-to-device copies lack error handling.
The cudaMemcpy operations don't check for errors, which could lead to silent failures.
Wrap the memcpy operations with error checking:
-cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
- num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice);
-cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()),
- num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice);
+CUDA_SAFE_CALL(cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
+ num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice));
+CUDA_SAFE_CALL(cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()),
+ num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()), | |
| num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice); | |
| cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()), | |
| num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice); | |
| CUDA_SAFE_CALL(cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()), | |
| num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice)); | |
| CUDA_SAFE_CALL(cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()), | |
| num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice)); |
🤖 Prompt for AI Agents
In src/sphere_generation/icosahedron_data_manager.cpp around lines 58-61, the
two cudaMemcpy host-to-device calls are un-checked; capture each call's
cudaError_t return value and verify it equals cudaSuccess. For each memcpy,
assign the result to a variable (e.g. err = cudaMemcpy(...)), and if err !=
cudaSuccess log or propagate the failure (use cudaGetErrorString(err) in the log
message) and return/throw/exit appropriately so failures are not silent; do this
for both the vertices and indices copies.
💡 Verification agent
🧩 Analysis chain
Verify vertex data size mismatch.
The vertices are of type graphics::vertex but you're copying glm::vec3 data. This size mismatch could cause buffer overruns.
The copy operation uses sizeof(glm::vec3) but the buffer is allocated for graphics::vertex type. According to the code, graphics::vertex contains position, normal, tangent, bitangent, and tex_coord fields, which is much larger than just glm::vec3. This will cause incomplete initialization of the vertex data.
Apply this fix:
-cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
- num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice);
+cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()),
+ num_vertices_cpu * sizeof(graphics::vertex), cudaMemcpyHostToDevice);🏁 Script executed:
#!/bin/bash
# Check the definition of graphics::vertex to understand its structure
ast-grep --pattern 'struct vertex {
$$$
}'
# Also check for graphics::vertex in namespace
ast-grep --pattern 'namespace graphics {
$$$
struct vertex {
$$$
}
$$$
}'Length of output: 134
🏁 Script executed:
#!/bin/bash
# Search for generate_icosahedron_vertices definition and its return type
rg -n "generate_icosahedron_vertices" -C3 --type cpp
# Search for the definition of graphics::vertex
rg -n "struct vertex" -C3 --type cpp
# Search for the namespace graphics block
rg -n "namespace graphics" -C3 --type cppLength of output: 3681
🏁 Script executed:
#!/bin/bash
# Display the full definition of raw::graphics::vertex to inspect all fields
sed -n '1,200p' include/graphics/vertex.hLength of output: 427
🏁 Script executed:
#!/bin/bash
# Inspect how vertices_handle is allocated and used
rg -n "vertices_handle" -C3 --type cpp,hLength of output: 70
Correct vertex copy size
The cudaMemcpy and cudaMemcpyAsync calls only copy sizeof(glm::vec3) into buffers of raw::graphics::vertex (which include position, tex_coord, normal, tangent, bitangent), leading to incomplete initialization and potential overruns. Change both to use sizeof(raw::graphics::vertex):
--- src/sphere_generation/icosahedron_data_manager.cpp
@@ -58,7 +58,7 @@
- cudaMemcpy(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()),
- num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice);
+ cudaMemcpy(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()),
+ num_vertices_cpu * sizeof(raw::graphics::vertex), cudaMemcpyHostToDevice);
@@ -78,7 +78,7 @@
- cudaMemcpyAsync(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()),
- num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice, stream->stream());
+ cudaMemcpyAsync(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()),
+ num_vertices_cpu * sizeof(raw::graphics::vertex), cudaMemcpyHostToDevice, stream->stream());📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cudaMemcpy(vertices_handle.get_data(), (void *)std::data(generate_icosahedron_vertices()), | |
| num_vertices_cpu * sizeof(glm::vec3), cudaMemcpyHostToDevice); | |
| cudaMemcpy(indices_handle.get_data(), (void *)std::data(generate_icosahedron_indices()), | |
| num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice); | |
| // Synchronous copy: use full vertex struct size | |
| cudaMemcpy(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()), | |
| num_vertices_cpu * sizeof(raw::graphics::vertex), cudaMemcpyHostToDevice); | |
| cudaMemcpy(indices_handle.get_data(), std::data(generate_icosahedron_indices()), | |
| num_triangles_cpu * 3 * sizeof(UI), cudaMemcpyHostToDevice); | |
| // Asynchronous copy: also use full vertex struct size | |
| cudaMemcpyAsync(vertices_handle.get_data(), std::data(generate_icosahedron_vertices()), | |
| num_vertices_cpu * sizeof(raw::graphics::vertex), cudaMemcpyHostToDevice, stream->stream()); |
🤖 Prompt for AI Agents
In src/sphere_generation/icosahedron_data_manager.cpp around lines 58 to 61, the
cudaMemcpy that copies vertex data currently uses sizeof(glm::vec3) which only
accounts for position and not the full raw::graphics::vertex structure; update
the copy size to num_vertices_cpu * sizeof(raw::graphics::vertex) (and replace
sizeof(glm::vec3) with sizeof(raw::graphics::vertex)). Also audit and change any
corresponding cudaMemcpyAsync calls to use sizeof(raw::graphics::vertex) so the
device buffer receives the full vertex (position, tex_coord, normal, tangent,
bitangent) and avoid partial initialization/overruns; leave the indices copy
calculation as-is using UI.
| cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_triangles_cpu(sizeof(uint32_t)); | ||
| cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_unique_edges_cpu( | ||
| sizeof(uint32_t)); | ||
| *num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES; | ||
| auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Use stack allocation for single values instead of heap buffers
Using cuda_types::cuda_buffer for single uint32_t values adds unnecessary overhead. Consider using stack variables instead.
- cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_triangles_cpu(sizeof(uint32_t));
- cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_unique_edges_cpu(
- sizeof(uint32_t));
- *num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES;
+ uint32_t num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES;
+ uint32_t num_unique_edges_cpu = 0;Then update all usages to remove the dereference operator:
*num_triangles_cpu→num_triangles_cpu*num_unique_edges_cpu→num_unique_edges_cpu
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_triangles_cpu(sizeof(uint32_t)); | |
| cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_unique_edges_cpu( | |
| sizeof(uint32_t)); | |
| *num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES; | |
| auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block; | |
| // Before: | |
| // cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_triangles_cpu(sizeof(uint32_t)); | |
| // cuda_types::cuda_buffer<uint32_t, cuda_types::side::host> num_unique_edges_cpu( | |
| // sizeof(uint32_t)); | |
| // *num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES; | |
| // auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block; | |
| // After: | |
| uint32_t num_triangles_cpu = predef::BASIC_AMOUNT_OF_TRIANGLES; | |
| uint32_t num_unique_edges_cpu = 0; | |
| auto blocks = (num_triangles_cpu + threads_per_block - 1) / threads_per_block; |
🤖 Prompt for AI Agents
In src/sphere_generation/launch_tessellation.cu around lines 33 to 37, replace
the heap-allocated single-value cuda_buffer usages with plain stack uint32_t
variables (e.g., uint32_t num_triangles_cpu and uint32_t num_unique_edges_cpu),
assign predef::BASIC_AMOUNT_OF_TRIANGLES directly to num_triangles_cpu, compute
blocks using num_triangles_cpu (no dereference), and update all subsequent
references in this file from *num_triangles_cpu → num_triangles_cpu and
*num_unique_edges_cpu → num_unique_edges_cpu.
| auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block; | ||
| int base = 12; | ||
| CUDA_SAFE_CALL( | ||
| cudaMemcpyAsync(p_vertex_count, &base, sizeof(uint32_t), cudaMemcpyHostToDevice, stream)); |
There was a problem hiding this comment.
Potential integer overflow in blocks calculation
The calculation (*num_triangles_cpu + threads_per_block - 1) / threads_per_block can overflow if *num_triangles_cpu is close to UINT32_MAX. Consider using a safer calculation or adding an overflow check.
- auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block;
+ if (*num_triangles_cpu > UINT32_MAX - threads_per_block + 1) {
+ throw std::overflow_error("Triangle count too large for block calculation");
+ }
+ auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block; | |
| int base = 12; | |
| CUDA_SAFE_CALL( | |
| cudaMemcpyAsync(p_vertex_count, &base, sizeof(uint32_t), cudaMemcpyHostToDevice, stream)); | |
| // guard against overflow when rounding up | |
| if (*num_triangles_cpu > UINT32_MAX - threads_per_block + 1) { | |
| throw std::overflow_error("Triangle count too large for block calculation"); | |
| } | |
| auto blocks = (*num_triangles_cpu + threads_per_block - 1) / threads_per_block; | |
| int base = 12; | |
| CUDA_SAFE_CALL( | |
| cudaMemcpyAsync(p_vertex_count, &base, sizeof(uint32_t), cudaMemcpyHostToDevice, stream)); |
🤖 Prompt for AI Agents
In src/sphere_generation/launch_tessellation.cu around lines 37 to 40, the
expression (*num_triangles_cpu + threads_per_block - 1) / threads_per_block can
overflow when *num_triangles_cpu is near UINT32_MAX; change the calculation to
do the addition in a wider integer (e.g. cast operands to uint64_t or size_t),
compute blocks as (uint64_t(*num_triangles_cpu) + threads_per_block - 1) /
threads_per_block, and then validate or clamp the result to the type used for
kernel launch (or return an error if it exceeds allowable launch limits) to
prevent overflow and undefined behavior.
| gl::ATTR(SDL_GL_DEPTH_SIZE, 24); | ||
| gl_window::gl_window(const std::string &window_name) { | ||
| SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_CORE); | ||
| SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify context sharing behavior for multiple GL contexts.
Setting SDL_GL_SHARE_WITH_CURRENT_CONTEXT to 1 before any context exists may not have the intended effect. This attribute typically needs to be set between context creations to enable sharing between the current context and the next one to be created.
The proper pattern for creating shared contexts should be:
- Create main context
- Make it current
- Set
SDL_GL_SHARE_WITH_CURRENT_CONTEXTto 1 - Create tessellation context (will share with main)
- Create texture_gen context (will share with main)
Consider restructuring the context creation to ensure proper sharing.
🏁 Script executed:
#!/bin/bash
# Check if there are any other locations where SDL_GL_SHARE_WITH_CURRENT_CONTEXT is used
rg -n "SDL_GL_SHARE_WITH_CURRENT_CONTEXT" --type cppLength of output: 140
🏁 Script executed:
#!/bin/bash
# Display the first 200 lines of gl_window.cpp to inspect context creation sequence
sed -n '1,200p' src/window/gl_window.cpp
# Locate all uses of SDL_GL_CreateContext in this file to see when contexts are created
rg -n "SDL_GL_CreateContext" -C3 src/window/gl_window.cppLength of output: 4247
Set SDL_GL_SHARE_WITH_CURRENT_CONTEXT after making the main context current
Move the SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1) call to after
SDL_GL_MakeCurrent(data.window, data.main_context);and before creating the tessellation and texture_gen contexts so they actually share lists with the main context.
🤖 Prompt for AI Agents
In src/window/gl_window.cpp around line 13, the
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1) call is placed too
early; move this call to immediately after SDL_GL_MakeCurrent(data.window,
data.main_context) and before creating the tessellation and texture_gen contexts
so those contexts share lists with the main context. Ensure the sequence is:
create main_context, make it current, set SDL_GL_SHARE_WITH_CURRENT_CONTEXT to
1, then create the other contexts (tessellation and texture_gen).
| gl::VIEW(0, 0, 2560, 1440); | ||
| SDL_GL_MakeCurrent(data.window, data.main_context); |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Hardcoded viewport dimensions should match window creation parameters.
The viewport is set to fixed dimensions (2560x1440) which match the window creation, but this creates a maintenance issue if the resolution needs to change.
Consider using the window dimensions dynamically:
-gl::VIEW(0, 0, 2560, 1440);
+int width, height;
+SDL_GetWindowSize(data.window, &width, &height);
+gl::VIEW(0, 0, width, height);🤖 Prompt for AI Agents
In src/window/gl_window.cpp around lines 50-51, the viewport is hardcoded to
2560x1440 which can drift from actual window size; replace the fixed dimensions
by querying the window's current size (e.g., call SDL_GetWindowSize or use the
stored width/height) after making the GL context current, then pass those
dynamic width/height values to gl::VIEW (or glViewport) so the viewport always
matches the actual window resolution.
Summary by CodeRabbit
New Features
Improvements
Changes
Chores