From 4bb5640675962660a996471eb58c7f1d34abe2bf Mon Sep 17 00:00:00 2001 From: sezer-muhammed Date: Mon, 10 Nov 2025 20:51:56 -0500 Subject: [PATCH 1/8] feat: add TODO for fixing bicubic interpolation boundary issues --- bicubic_fix_todo.md | 65 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 bicubic_fix_todo.md diff --git a/bicubic_fix_todo.md b/bicubic_fix_todo.md new file mode 100644 index 00000000..548c90e0 --- /dev/null +++ b/bicubic_fix_todo.md @@ -0,0 +1,65 @@ +# Bicubic Interpolation Boundary TODO + +## Public Evidence & Current Status +- The repository README already flags the issue: *“The Resize and RandomResizedCrop operators incorrectly interpolate pixel values near the boundary of an image or tensor when using cubic interpolation.”* (`README.md:67`). No fix is merged yet. +- Users report visible bands or flattened edges when calling any front-end that forwards to these operators with `Interp.CUBIC` (Python, C++, Torch bindings, etc.). + +## Impacted APIs And Code Paths + +| Public entry point | Binding file | C++ operator | Legacy implementation | Notes | +| --- | --- | --- | --- | --- | +| `cvcuda.resize` / `resize_into` (tensor & var-shape) | `python/mod_cvcuda/OpResize.cpp:32-206` | `cvcuda::Resize` (`src/cvcuda/priv/OpResize.cpp:24-63`) | `legacy::ResizeVarShape::infer` (`src/cvcuda/priv/legacy/resize_var_shape.cu:431-470`) which launches `resize_bicubic` (`:164-229`) | This is the path used by most samples/tests. | +| `cvcuda.random_resized_crop` / `_into` | `python/mod_cvcuda/OpRandomResizedCrop.cpp:32-213` | `cvcuda::RandomResizedCrop` (`src/cvcuda/priv/OpRandomResizedCrop.cpp:24-63`) | `legacy::RandomResizedCrop` + `RandomResizedCropVarShape` (tensor kernel in `src/cvcuda/priv/legacy/random_resized_crop.cu:180-223`, var-shape version in `random_resized_crop_var_shape.cu:185-231`) | Same bicubic math is duplicated here. | +| `cvcuda.pillowresize` | `python/mod_cvcuda/OpPillowResize.cpp:43-229` | `cvcuda::PillowResize` (`src/cvcuda/priv/OpPillowResize.cpp:31-91`) | Uses `legacy::PillowResize` (`src/cvcuda/priv/legacy/pillow_resize*.cu`). These kernels have their own filter precomputation and are not part of the README warning, but they should be re-tested once the shared fix is ready. | + +> **Test coverage gap:** `tests/cvcuda/python/test_opresize.py:51-226` and `tests/cvcuda/python/test_oprandomresizedcrop.py:51-145` only assert shape/layout/dtype and never validate pixel correctness at the borders, so the regression slipped through. + +## Root Cause (Legacy Bicubic Kernels) + +### `resize_var_shape.cu` +- In `resize_bicubic` the source coordinate is computed in floating-point (`fy`/`fx`), then the integer anchor is clamped into `[1, H-3]` or `[1, W-3]` (`src/cvcuda/priv/legacy/resize_var_shape.cu:186-205`). The fractional offsets are left as if the coordinate had *not* been clamped. When the original sample location lies outside the safe region, the kernel still reads a clamped 4×4 neighborhood but applies weights derived from the unclamped offsets, producing biased contributions along the top/bottom edges. +- Horizontally the code multiplies `fx` by `((sx >= 1) && (sx < width - 3))` (`:201-205`). Effectively `fx` becomes `0` when the unclamped coordinate is out of range, collapsing the cubic polynomial to the center tap and causing “flattened” edges. +- The same math exists in the var-shape specialization and both are guarded by `LEGACY_BICUBIC_MATH_VS`, which optionally applies `abs()` to the accumulated value. That conditional does not fix the weighting error. + +### `random_resized_crop.cu` and `_var_shape.cu` +- The bicubic branch duplicates the exact same logic, with the only difference being the extra crop offsets (`top/left`). See `src/cvcuda/priv/legacy/random_resized_crop.cu:180-205` and `random_resized_crop_var_shape.cu:185-210`. Consequently RandomResizedCrop shows the same boundary artifacts whenever the chosen crop touches the image edges (a common case when scale≈1). + +### Why PillowResize is likely unaffected +- `legacy::PillowResize` precomputes filter taps per output pixel (`pillow_resize_var_shape.cu:41-150`) and keeps consistent bounds/weights, so it does not share this particular bug. Earlier reports conflated PillowResize with Resize because both expose bicubic interpolation to Python; this TODO focuses on the README-confirmed `Resize`/`RandomResizedCrop` paths. + +## Fix Plan +1. **Recompute fractional offsets after clamping** + - Store the unclamped integer coordinate before applying `cuda::max/min`. + - After clamping, add `(original_s - clamped_s)` back into `fx/fy` (or equivalently clamp using border accessors so that fractional offsets remain untouched). + - Apply this to both axes in: + - `src/cvcuda/priv/legacy/resize_var_shape.cu:186-205` (tensor + var-shape kernel). + - `src/cvcuda/priv/legacy/random_resized_crop.cu:180-205`. + - `src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu:185-210`. +2. **Consider border wrappers instead of manual clamping** + - `cuda::BorderVarShapeWrap` is already used for AREA interpolation (`resize_var_shape.cu:465-468`). Extending bicubic to read through a border wrapper simplifies the math and avoids manual conditionals. +3. **Audit other consumers** + - Search for `LEGACY_BICUBIC_MATH` and `fx *= (` patterns to ensure no other kernels copy this bug. +4. **Remove the `fx *= condition` workaround** + - Once rate-limited clamping is correct, that line should be deleted; it currently hides the bug by forcing `fx=0`. +5. **Optional refactor** + - Extract shared bicubic weight computation into a helper to keep Resize and RandomResizedCrop in sync and make future maintenance easier. + +## Validation Plan +1. **Unit tests (Python)** + - Extend `tests/cvcuda/python/test_opresize.py` with a bicubic testcase that resizes a horizontal gradient tensor, compares against a CPU reference (e.g., OpenCV or Pillow), and asserts per-pixel error near the first/last 4 rows/columns. + - Do the same for `tests/cvcuda/python/test_oprandomresizedcrop.py`, forcing crop parameters that align the output border with the source border. +2. **Unit tests (C++)** + - Add a small golden test under `tests/cvcuda/cpp` (or similar) that feeds a known 5×5 image through `cvcuda::Resize` and checks the CUDA output against a precomputed host result. +3. **Visual sanity** + - Reproduce the before/after behavior using a script that calls `cvcuda.resize` on a checkerboard and dumps the first/last rows to confirm the band disappears. +4. **Performance regression check** + - Run existing resize/random-resized-crop benchmarks (`bench/BenchResize*.cpp`, `bench/BenchRandomResizedCrop.cpp`) to make sure the extra math does not degrade throughput noticeably. +5. **Documentation** + - Once validated, remove or update the warning in `README.md:67` and mention the fix in the release notes. + +## TODO Checklist +- [ ] Patch `resize_bicubic` (tensor + var-shape paths) to rebase fractional offsets after clamping. +- [ ] Patch `random_resized_crop` (tensor + var-shape) to use the corrected math. +- [ ] Add regression tests that compare against a CPU bicubic reference and fail if boundary pixels deviate beyond a tolerance. +- [ ] Verify PillowResize to ensure no regressions and document if it shares or does not share the issue. +- [ ] Update README/release notes once the bug is fixed. From fb973982863c9758bb0ece1d87986930e8c2a3e0 Mon Sep 17 00:00:00 2001 From: sezer-muhammed Date: Tue, 11 Nov 2025 14:52:03 -0500 Subject: [PATCH 2/8] fix: clamp sy and sx values in bicubic interpolation to prevent out-of-bounds access --- src/cvcuda/priv/legacy/random_resized_crop.cu | 9 ++++++--- src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu | 9 ++++++--- src/cvcuda/priv/legacy/resize_var_shape.cu | 9 ++++++--- 3 files changed, 18 insertions(+), 9 deletions(-) diff --git a/src/cvcuda/priv/legacy/random_resized_crop.cu b/src/cvcuda/priv/legacy/random_resized_crop.cu index 8280595d..febb387e 100644 --- a/src/cvcuda/priv/legacy/random_resized_crop.cu +++ b/src/cvcuda/priv/legacy/random_resized_crop.cu @@ -179,7 +179,9 @@ __global__ void resize_cubic_v1(const SrcWrapper src, DstWrapper dst, int2 srcSi float fy = (float)((dst_y + 0.5f) * scale_y - 0.5f + top); int sy = cuda::round(fy); fy -= sy; - sy = cuda::max(1, cuda::min(sy, height - 3)); + const int syClamped = cuda::max(1, cuda::min(sy, height - 3)); + fy += static_cast(sy - syClamped); + sy = syClamped; const float A = -0.75f; @@ -194,8 +196,9 @@ __global__ void resize_cubic_v1(const SrcWrapper src, DstWrapper dst, int2 srcSi float fx = (float)((dst_x + 0.5f) * scale_x - 0.5f + left); int sx = cuda::round(fx); fx -= sx; - fx *= ((sx >= 1) && (sx < width - 3)); - sx = cuda::max(1, cuda::min(sx, width - 3)); + const int sxClamped = cuda::max(1, cuda::min(sx, width - 3)); + fx += static_cast(sx - sxClamped); + sx = sxClamped; float cX[4]; cX[0] = ((A * (fx + 1.0f) - 5.0f * A) * (fx + 1.0f) + 8.0f * A) * (fx + 1.0f) - 4.0f * A; diff --git a/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu b/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu index dbd3a49e..3f2441e6 100644 --- a/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu +++ b/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu @@ -191,7 +191,9 @@ __global__ void resize_cubic_v1(const SrcWrapper src, DstWrapper dst, const int float fy = (float)((dst_y + 0.5f) * scale_y - 0.5f + top); int sy = cuda::round(fy); fy -= sy; - sy = cuda::max(1, cuda::min(sy, height - 3)); + const int syClamped = cuda::max(1, cuda::min(sy, height - 3)); + fy += static_cast(sy - syClamped); + sy = syClamped; const float A = -0.75f; @@ -206,8 +208,9 @@ __global__ void resize_cubic_v1(const SrcWrapper src, DstWrapper dst, const int float fx = (float)((dst_x + 0.5f) * scale_x - 0.5f + left); int sx = cuda::round(fx); fx -= sx; - fx *= ((sx >= 1) && (sx < width - 3)); - sx = cuda::max(1, cuda::min(sx, width - 3)); + const int sxClamped = cuda::max(1, cuda::min(sx, width - 3)); + fx += static_cast(sx - sxClamped); + sx = sxClamped; float cX[4]; cX[0] = ((A * (fx + 1.0f) - 5.0f * A) * (fx + 1.0f) + 8.0f * A) * (fx + 1.0f) - 4.0f * A; diff --git a/src/cvcuda/priv/legacy/resize_var_shape.cu b/src/cvcuda/priv/legacy/resize_var_shape.cu index f0d7a293..c9e3d252 100644 --- a/src/cvcuda/priv/legacy/resize_var_shape.cu +++ b/src/cvcuda/priv/legacy/resize_var_shape.cu @@ -186,7 +186,9 @@ __global__ void resize_bicubic(cuda::ImageBatchVarShapeWrap src, cuda:: float fy = (float)((dst_y + 0.5f) * scale_y - 0.5f); int sy = cuda::round(fy); fy -= sy; - sy = cuda::max(1, cuda::min(sy, height - 3)); + const int syClamped = cuda::max(1, cuda::min(sy, height - 3)); + fy += static_cast(sy - syClamped); // rebase fractional offset after clamp + sy = syClamped; const float A = -0.75f; @@ -201,8 +203,9 @@ __global__ void resize_bicubic(cuda::ImageBatchVarShapeWrap src, cuda:: float fx = (float)((dst_x + 0.5f) * scale_x - 0.5f); int sx = cuda::round(fx); fx -= sx; - fx *= ((sx >= 1) && (sx < width - 3)); - sx = cuda::max(1, cuda::min(sx, width - 3)); + const int sxClamped = cuda::max(1, cuda::min(sx, width - 3)); + fx += static_cast(sx - sxClamped); + sx = sxClamped; float cX[4]; cX[0] = ((A * (fx + 1.0f) - 5.0f * A) * (fx + 1.0f) + 8.0f * A) * (fx + 1.0f) - 4.0f * A; From 5b422917f0ff205e38ea8638c974c8503604ae97 Mon Sep 17 00:00:00 2001 From: sezer-muhammed Date: Tue, 11 Nov 2025 16:12:25 -0500 Subject: [PATCH 3/8] chore: bump version to 0.15.1 and add diagnostic prints for bicubic kernels --- CMakeLists.txt | 6 +++--- src/cvcuda/priv/legacy/random_resized_crop.cu | 5 +++++ src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu | 5 +++++ src/cvcuda/priv/legacy/resize_var_shape.cu | 6 ++++++ 4 files changed, 19 insertions(+), 3 deletions(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index 236ae142..e61b5c3b 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -22,9 +22,9 @@ if(CMAKE_CUDA_ARCHITECTURES) endif() project(cvcuda - LANGUAGES C CXX - VERSION 0.15.0 - DESCRIPTION "CUDA-accelerated Computer Vision algorithms" + LANGUAGES C CXX + VERSION 0.15.1 + DESCRIPTION "CUDA-accelerated Computer Vision algorithms" ) # Make sure the cuda host compiler agrees with what we're using, diff --git a/src/cvcuda/priv/legacy/random_resized_crop.cu b/src/cvcuda/priv/legacy/random_resized_crop.cu index febb387e..e7d0daf4 100644 --- a/src/cvcuda/priv/legacy/random_resized_crop.cu +++ b/src/cvcuda/priv/legacy/random_resized_crop.cu @@ -27,6 +27,7 @@ #include #include +#include using namespace nvcv; using namespace nvcv::legacy::cuda_op; @@ -205,6 +206,10 @@ __global__ void resize_cubic_v1(const SrcWrapper src, DstWrapper dst, int2 srcSi cX[1] = ((A + 2.0f) * fx - (A + 3.0f)) * fx * fx + 1.0f; cX[2] = ((A + 2.0f) * (1.0f - fx) - (A + 3.0f)) * (1.0f - fx) * (1.0f - fx) + 1.0f; cX[3] = 1.0f - cX[0] - cX[1] - cX[2]; + if (dst_x == 0 && dst_y == 0 && batch_idx == 0) + { + printf("[cvcuda diag v0.15.1] resize_cubic_v1 kernel active (tensor). sy=%d fy=%f sx=%d fx=%f top=%d left=%d\n", sy, fy, sx, fx, top, left); + } #pragma unroll for (int row = 0; row < 4; ++row) { diff --git a/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu b/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu index 3f2441e6..a15a186c 100644 --- a/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu +++ b/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu @@ -27,6 +27,7 @@ #include #include +#include using namespace nvcv; using namespace nvcv::legacy::cuda_op; @@ -217,6 +218,10 @@ __global__ void resize_cubic_v1(const SrcWrapper src, DstWrapper dst, const int cX[1] = ((A + 2.0f) * fx - (A + 3.0f)) * fx * fx + 1.0f; cX[2] = ((A + 2.0f) * (1.0f - fx) - (A + 3.0f)) * (1.0f - fx) * (1.0f - fx) + 1.0f; cX[3] = 1.0f - cX[0] - cX[1] - cX[2]; + if (dst_x == 0 && dst_y == 0 && batch_idx == 0) + { + printf("[cvcuda diag v0.15.1] resize_cubic_v1 kernel active (var-shape). sy=%d fy=%f sx=%d fx=%f top=%d left=%d\n", sy, fy, sx, fx, top, left); + } #pragma unroll for (int row = 0; row < 4; ++row) { diff --git a/src/cvcuda/priv/legacy/resize_var_shape.cu b/src/cvcuda/priv/legacy/resize_var_shape.cu index c9e3d252..c5d6cc5f 100644 --- a/src/cvcuda/priv/legacy/resize_var_shape.cu +++ b/src/cvcuda/priv/legacy/resize_var_shape.cu @@ -27,6 +27,7 @@ #include #include +#include using namespace nvcv::legacy::cuda_op; using namespace nvcv::legacy::helpers; @@ -212,6 +213,11 @@ __global__ void resize_bicubic(cuda::ImageBatchVarShapeWrap src, cuda:: cX[1] = ((A + 2.0f) * fx - (A + 3.0f)) * fx * fx + 1.0f; cX[2] = ((A + 2.0f) * (1.0f - fx) - (A + 3.0f)) * (1.0f - fx) * (1.0f - fx) + 1.0f; cX[3] = 1.0f - cX[0] - cX[1] - cX[2]; + // Diagnostic print: only one thread to reduce spam + if (dst_x == 0 && dst_y == 0 && batch_idx == 0) + { + printf("[cvcuda diag v0.15.1] resize_bicubic kernel active (var-shape). sy=%d fy=%f sx=%d fx=%f\n", sy, fy, sx, fx); + } #pragma unroll for (int row = 0; row < 4; ++row) { From 7aefa1f8c2b3b88d1839c6d21ee5507203da910b Mon Sep 17 00:00:00 2001 From: sezer-muhammed Date: Tue, 11 Nov 2025 16:43:45 -0500 Subject: [PATCH 4/8] fix: optimize cubic interpolation calculations by using const for srcCoord and adjusting iSrcCoord bounds --- src/cvcuda/priv/OpResize.cu | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/src/cvcuda/priv/OpResize.cu b/src/cvcuda/priv/OpResize.cu index 448ff958..7144cd06 100644 --- a/src/cvcuda/priv/OpResize.cu +++ b/src/cvcuda/priv/OpResize.cu @@ -354,17 +354,15 @@ __global__ void CubicResize(SrcWrapper src, DstWrapper dst, int2 srcSize, int2 d if (dstCoord.y < dstSize.y && dstCoord.x < dstSize.x) { - float2 srcCoord = (cuda::DropCast<2>(dstCoord) + .5f) * scaleRatio - .5f; - int3 iSrcCoord{(int)floor(srcCoord.x), (int)floor(srcCoord.y), dstCoord.z}; - - float fx = srcCoord.x - iSrcCoord.x; - float fy = srcCoord.y - iSrcCoord.y; - - fx = (iSrcCoord.x < 1 || iSrcCoord.x >= srcSize.x - 3) ? 0 : fx; + const float2 srcCoord = (cuda::DropCast<2>(dstCoord) + .5f) * scaleRatio - .5f; + int3 iSrcCoord{(int)floor(srcCoord.x), (int)floor(srcCoord.y), dstCoord.z}; iSrcCoord.y = cuda::max(1, cuda::min(iSrcCoord.y, srcSize.y - 3)); iSrcCoord.x = cuda::max(1, cuda::min(iSrcCoord.x, srcSize.x - 3)); + const float fx = srcCoord.x - iSrcCoord.x; + const float fy = srcCoord.y - iSrcCoord.y; + float wx[4]; float wy[4]; From a99e678827abb84237e8db9af34b670d8085978c Mon Sep 17 00:00:00 2001 From: sezer-muhammed Date: Tue, 11 Nov 2025 17:23:18 -0500 Subject: [PATCH 5/8] fix: improve cubic interpolation by adjusting source coordinate calculations and clamping bounds --- src/cvcuda/priv/OpResize.cu | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/src/cvcuda/priv/OpResize.cu b/src/cvcuda/priv/OpResize.cu index 7144cd06..e29919b0 100644 --- a/src/cvcuda/priv/OpResize.cu +++ b/src/cvcuda/priv/OpResize.cu @@ -355,13 +355,13 @@ __global__ void CubicResize(SrcWrapper src, DstWrapper dst, int2 srcSize, int2 d if (dstCoord.y < dstSize.y && dstCoord.x < dstSize.x) { const float2 srcCoord = (cuda::DropCast<2>(dstCoord) + .5f) * scaleRatio - .5f; - int3 iSrcCoord{(int)floor(srcCoord.x), (int)floor(srcCoord.y), dstCoord.z}; + int3 baseCoord{(int)floor(srcCoord.x), (int)floor(srcCoord.y), dstCoord.z}; - iSrcCoord.y = cuda::max(1, cuda::min(iSrcCoord.y, srcSize.y - 3)); - iSrcCoord.x = cuda::max(1, cuda::min(iSrcCoord.x, srcSize.x - 3)); + const float fx = srcCoord.x - baseCoord.x; + const float fy = srcCoord.y - baseCoord.y; - const float fx = srcCoord.x - iSrcCoord.x; - const float fy = srcCoord.y - iSrcCoord.y; + const int xMax = srcSize.x - 1; + const int yMax = srcSize.y - 1; float wx[4]; float wy[4]; @@ -374,10 +374,13 @@ __global__ void CubicResize(SrcWrapper src, DstWrapper dst, int2 srcSize, int2 d #pragma unroll for (int cy = -1; cy <= 2; cy++) { + const int sy = cuda::min(cuda::max(baseCoord.y + cy, 0), yMax); #pragma unroll for (int cx = -1; cx <= 2; cx++) { - sum += src[int3{iSrcCoord.x + cx, iSrcCoord.y + cy, iSrcCoord.z}] * (wx[cx + 1] * wy[cy + 1]); + const int sx = cuda::min(cuda::max(baseCoord.x + cx, 0), xMax); + + sum += src[int3{sx, sy, baseCoord.z}] * (wx[cx + 1] * wy[cy + 1]); } } From f965e35768e4e32a9c2d14487a1c10b1d69fd6a8 Mon Sep 17 00:00:00 2001 From: sezer-muhammed Date: Tue, 11 Nov 2025 18:05:13 -0500 Subject: [PATCH 6/8] fix: remove diagnostic print statements from bicubic interpolation kernels --- bicubic_fix_todo.md | 65 ------------------- src/cvcuda/priv/legacy/random_resized_crop.cu | 4 -- .../legacy/random_resized_crop_var_shape.cu | 4 -- src/cvcuda/priv/legacy/resize_var_shape.cu | 5 -- 4 files changed, 78 deletions(-) delete mode 100644 bicubic_fix_todo.md diff --git a/bicubic_fix_todo.md b/bicubic_fix_todo.md deleted file mode 100644 index 548c90e0..00000000 --- a/bicubic_fix_todo.md +++ /dev/null @@ -1,65 +0,0 @@ -# Bicubic Interpolation Boundary TODO - -## Public Evidence & Current Status -- The repository README already flags the issue: *“The Resize and RandomResizedCrop operators incorrectly interpolate pixel values near the boundary of an image or tensor when using cubic interpolation.”* (`README.md:67`). No fix is merged yet. -- Users report visible bands or flattened edges when calling any front-end that forwards to these operators with `Interp.CUBIC` (Python, C++, Torch bindings, etc.). - -## Impacted APIs And Code Paths - -| Public entry point | Binding file | C++ operator | Legacy implementation | Notes | -| --- | --- | --- | --- | --- | -| `cvcuda.resize` / `resize_into` (tensor & var-shape) | `python/mod_cvcuda/OpResize.cpp:32-206` | `cvcuda::Resize` (`src/cvcuda/priv/OpResize.cpp:24-63`) | `legacy::ResizeVarShape::infer` (`src/cvcuda/priv/legacy/resize_var_shape.cu:431-470`) which launches `resize_bicubic` (`:164-229`) | This is the path used by most samples/tests. | -| `cvcuda.random_resized_crop` / `_into` | `python/mod_cvcuda/OpRandomResizedCrop.cpp:32-213` | `cvcuda::RandomResizedCrop` (`src/cvcuda/priv/OpRandomResizedCrop.cpp:24-63`) | `legacy::RandomResizedCrop` + `RandomResizedCropVarShape` (tensor kernel in `src/cvcuda/priv/legacy/random_resized_crop.cu:180-223`, var-shape version in `random_resized_crop_var_shape.cu:185-231`) | Same bicubic math is duplicated here. | -| `cvcuda.pillowresize` | `python/mod_cvcuda/OpPillowResize.cpp:43-229` | `cvcuda::PillowResize` (`src/cvcuda/priv/OpPillowResize.cpp:31-91`) | Uses `legacy::PillowResize` (`src/cvcuda/priv/legacy/pillow_resize*.cu`). These kernels have their own filter precomputation and are not part of the README warning, but they should be re-tested once the shared fix is ready. | - -> **Test coverage gap:** `tests/cvcuda/python/test_opresize.py:51-226` and `tests/cvcuda/python/test_oprandomresizedcrop.py:51-145` only assert shape/layout/dtype and never validate pixel correctness at the borders, so the regression slipped through. - -## Root Cause (Legacy Bicubic Kernels) - -### `resize_var_shape.cu` -- In `resize_bicubic` the source coordinate is computed in floating-point (`fy`/`fx`), then the integer anchor is clamped into `[1, H-3]` or `[1, W-3]` (`src/cvcuda/priv/legacy/resize_var_shape.cu:186-205`). The fractional offsets are left as if the coordinate had *not* been clamped. When the original sample location lies outside the safe region, the kernel still reads a clamped 4×4 neighborhood but applies weights derived from the unclamped offsets, producing biased contributions along the top/bottom edges. -- Horizontally the code multiplies `fx` by `((sx >= 1) && (sx < width - 3))` (`:201-205`). Effectively `fx` becomes `0` when the unclamped coordinate is out of range, collapsing the cubic polynomial to the center tap and causing “flattened” edges. -- The same math exists in the var-shape specialization and both are guarded by `LEGACY_BICUBIC_MATH_VS`, which optionally applies `abs()` to the accumulated value. That conditional does not fix the weighting error. - -### `random_resized_crop.cu` and `_var_shape.cu` -- The bicubic branch duplicates the exact same logic, with the only difference being the extra crop offsets (`top/left`). See `src/cvcuda/priv/legacy/random_resized_crop.cu:180-205` and `random_resized_crop_var_shape.cu:185-210`. Consequently RandomResizedCrop shows the same boundary artifacts whenever the chosen crop touches the image edges (a common case when scale≈1). - -### Why PillowResize is likely unaffected -- `legacy::PillowResize` precomputes filter taps per output pixel (`pillow_resize_var_shape.cu:41-150`) and keeps consistent bounds/weights, so it does not share this particular bug. Earlier reports conflated PillowResize with Resize because both expose bicubic interpolation to Python; this TODO focuses on the README-confirmed `Resize`/`RandomResizedCrop` paths. - -## Fix Plan -1. **Recompute fractional offsets after clamping** - - Store the unclamped integer coordinate before applying `cuda::max/min`. - - After clamping, add `(original_s - clamped_s)` back into `fx/fy` (or equivalently clamp using border accessors so that fractional offsets remain untouched). - - Apply this to both axes in: - - `src/cvcuda/priv/legacy/resize_var_shape.cu:186-205` (tensor + var-shape kernel). - - `src/cvcuda/priv/legacy/random_resized_crop.cu:180-205`. - - `src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu:185-210`. -2. **Consider border wrappers instead of manual clamping** - - `cuda::BorderVarShapeWrap` is already used for AREA interpolation (`resize_var_shape.cu:465-468`). Extending bicubic to read through a border wrapper simplifies the math and avoids manual conditionals. -3. **Audit other consumers** - - Search for `LEGACY_BICUBIC_MATH` and `fx *= (` patterns to ensure no other kernels copy this bug. -4. **Remove the `fx *= condition` workaround** - - Once rate-limited clamping is correct, that line should be deleted; it currently hides the bug by forcing `fx=0`. -5. **Optional refactor** - - Extract shared bicubic weight computation into a helper to keep Resize and RandomResizedCrop in sync and make future maintenance easier. - -## Validation Plan -1. **Unit tests (Python)** - - Extend `tests/cvcuda/python/test_opresize.py` with a bicubic testcase that resizes a horizontal gradient tensor, compares against a CPU reference (e.g., OpenCV or Pillow), and asserts per-pixel error near the first/last 4 rows/columns. - - Do the same for `tests/cvcuda/python/test_oprandomresizedcrop.py`, forcing crop parameters that align the output border with the source border. -2. **Unit tests (C++)** - - Add a small golden test under `tests/cvcuda/cpp` (or similar) that feeds a known 5×5 image through `cvcuda::Resize` and checks the CUDA output against a precomputed host result. -3. **Visual sanity** - - Reproduce the before/after behavior using a script that calls `cvcuda.resize` on a checkerboard and dumps the first/last rows to confirm the band disappears. -4. **Performance regression check** - - Run existing resize/random-resized-crop benchmarks (`bench/BenchResize*.cpp`, `bench/BenchRandomResizedCrop.cpp`) to make sure the extra math does not degrade throughput noticeably. -5. **Documentation** - - Once validated, remove or update the warning in `README.md:67` and mention the fix in the release notes. - -## TODO Checklist -- [ ] Patch `resize_bicubic` (tensor + var-shape paths) to rebase fractional offsets after clamping. -- [ ] Patch `random_resized_crop` (tensor + var-shape) to use the corrected math. -- [ ] Add regression tests that compare against a CPU bicubic reference and fail if boundary pixels deviate beyond a tolerance. -- [ ] Verify PillowResize to ensure no regressions and document if it shares or does not share the issue. -- [ ] Update README/release notes once the bug is fixed. diff --git a/src/cvcuda/priv/legacy/random_resized_crop.cu b/src/cvcuda/priv/legacy/random_resized_crop.cu index e7d0daf4..5dce4d6c 100644 --- a/src/cvcuda/priv/legacy/random_resized_crop.cu +++ b/src/cvcuda/priv/legacy/random_resized_crop.cu @@ -206,10 +206,6 @@ __global__ void resize_cubic_v1(const SrcWrapper src, DstWrapper dst, int2 srcSi cX[1] = ((A + 2.0f) * fx - (A + 3.0f)) * fx * fx + 1.0f; cX[2] = ((A + 2.0f) * (1.0f - fx) - (A + 3.0f)) * (1.0f - fx) * (1.0f - fx) + 1.0f; cX[3] = 1.0f - cX[0] - cX[1] - cX[2]; - if (dst_x == 0 && dst_y == 0 && batch_idx == 0) - { - printf("[cvcuda diag v0.15.1] resize_cubic_v1 kernel active (tensor). sy=%d fy=%f sx=%d fx=%f top=%d left=%d\n", sy, fy, sx, fx, top, left); - } #pragma unroll for (int row = 0; row < 4; ++row) { diff --git a/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu b/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu index a15a186c..f78d84a7 100644 --- a/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu +++ b/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu @@ -218,10 +218,6 @@ __global__ void resize_cubic_v1(const SrcWrapper src, DstWrapper dst, const int cX[1] = ((A + 2.0f) * fx - (A + 3.0f)) * fx * fx + 1.0f; cX[2] = ((A + 2.0f) * (1.0f - fx) - (A + 3.0f)) * (1.0f - fx) * (1.0f - fx) + 1.0f; cX[3] = 1.0f - cX[0] - cX[1] - cX[2]; - if (dst_x == 0 && dst_y == 0 && batch_idx == 0) - { - printf("[cvcuda diag v0.15.1] resize_cubic_v1 kernel active (var-shape). sy=%d fy=%f sx=%d fx=%f top=%d left=%d\n", sy, fy, sx, fx, top, left); - } #pragma unroll for (int row = 0; row < 4; ++row) { diff --git a/src/cvcuda/priv/legacy/resize_var_shape.cu b/src/cvcuda/priv/legacy/resize_var_shape.cu index c5d6cc5f..f0f547e4 100644 --- a/src/cvcuda/priv/legacy/resize_var_shape.cu +++ b/src/cvcuda/priv/legacy/resize_var_shape.cu @@ -213,11 +213,6 @@ __global__ void resize_bicubic(cuda::ImageBatchVarShapeWrap src, cuda:: cX[1] = ((A + 2.0f) * fx - (A + 3.0f)) * fx * fx + 1.0f; cX[2] = ((A + 2.0f) * (1.0f - fx) - (A + 3.0f)) * (1.0f - fx) * (1.0f - fx) + 1.0f; cX[3] = 1.0f - cX[0] - cX[1] - cX[2]; - // Diagnostic print: only one thread to reduce spam - if (dst_x == 0 && dst_y == 0 && batch_idx == 0) - { - printf("[cvcuda diag v0.15.1] resize_bicubic kernel active (var-shape). sy=%d fy=%f sx=%d fx=%f\n", sy, fy, sx, fx); - } #pragma unroll for (int row = 0; row < 4; ++row) { From 6312375827f5f3918bf61b4b6e16187ecfbeff3b Mon Sep 17 00:00:00 2001 From: sezer-muhammed Date: Tue, 11 Nov 2025 18:08:17 -0500 Subject: [PATCH 7/8] fix: downgrade project version from 0.15.1 to 0.15.0 --- CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index e61b5c3b..605c5030 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -23,7 +23,7 @@ endif() project(cvcuda LANGUAGES C CXX - VERSION 0.15.1 + VERSION 0.15.0 DESCRIPTION "CUDA-accelerated Computer Vision algorithms" ) From fc5a342d734a29c478627abc6f28419ba01f3bb7 Mon Sep 17 00:00:00 2001 From: sezer-muhammed Date: Tue, 11 Nov 2025 18:19:45 -0500 Subject: [PATCH 8/8] fix: remove unnecessary #include from legacy CUDA files --- CMakeLists.txt | 6 +++--- src/cvcuda/priv/legacy/random_resized_crop.cu | 1 - src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu | 1 - src/cvcuda/priv/legacy/resize_var_shape.cu | 1 - 4 files changed, 3 insertions(+), 6 deletions(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index 605c5030..4f749fc7 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -22,9 +22,9 @@ if(CMAKE_CUDA_ARCHITECTURES) endif() project(cvcuda - LANGUAGES C CXX - VERSION 0.15.0 - DESCRIPTION "CUDA-accelerated Computer Vision algorithms" + LANGUAGES C CXX + VERSION 0.15.0 + DESCRIPTION "CUDA-accelerated Computer Vision algorithms" ) # Make sure the cuda host compiler agrees with what we're using, diff --git a/src/cvcuda/priv/legacy/random_resized_crop.cu b/src/cvcuda/priv/legacy/random_resized_crop.cu index 5dce4d6c..febb387e 100644 --- a/src/cvcuda/priv/legacy/random_resized_crop.cu +++ b/src/cvcuda/priv/legacy/random_resized_crop.cu @@ -27,7 +27,6 @@ #include #include -#include using namespace nvcv; using namespace nvcv::legacy::cuda_op; diff --git a/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu b/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu index f78d84a7..3f2441e6 100644 --- a/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu +++ b/src/cvcuda/priv/legacy/random_resized_crop_var_shape.cu @@ -27,7 +27,6 @@ #include #include -#include using namespace nvcv; using namespace nvcv::legacy::cuda_op; diff --git a/src/cvcuda/priv/legacy/resize_var_shape.cu b/src/cvcuda/priv/legacy/resize_var_shape.cu index f0f547e4..c9e3d252 100644 --- a/src/cvcuda/priv/legacy/resize_var_shape.cu +++ b/src/cvcuda/priv/legacy/resize_var_shape.cu @@ -27,7 +27,6 @@ #include #include -#include using namespace nvcv::legacy::cuda_op; using namespace nvcv::legacy::helpers;