Conversation
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Co-authored-by: Wesley Maxey <71408887+wmaxey@users.noreply.github.com>
Those were forgotten to add to the migration guide (cherry picked from commit 53ef7fd) Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>
This avoids one of the last remaining patches they need to apply to CCCL
* Fix `not_fn` Our implementation of `perfect_forward` stores the functor in a tuple. However, that seems to break with e.g. device lambdas with captures * Drop some thrust macros from repo.toml (cherry picked from commit 185832a) Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
…rs try to use pack indexing (NVIDIA#4278) (NVIDIA#4284) (cherry picked from commit 863bb97) Co-authored-by: Eric Niebler <eniebler@nvidia.com>
Fixes NVIDIA#4318 (cherry picked from commit ed275a8) Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Co-authored-by: Federico Busato <50413820+fbusato@users.noreply.github.com>
…IDIA#4347) Co-authored-by: David Bayer <48736217+davebayer@users.noreply.github.com>
…uition (NVIDIA#4279) (NVIDIA#4287) * tweak the cccl compiler version check macros to better agree with intuition prior to this commit, a compiler check such as: ```c++ ``` would fail if the compiler was actually v19.1. that is because 19.1 is greater than 19. what the author of this code probably intended was to check only the compiler's major version number, in which case the check would have succeed. this commit changes the behavior of the following macros when only a major version number is specified: * `_CCCL_COMPILER` * `_CCCL_CUDA_COMPILER` * `_CCCL_CUDACC_BELOW` * `_CCCL_CUDACC_AT_LEAST` * guard `_CCCL_COMPILER(FOO)` with an extra set of parens Co-authored-by: Eric Niebler <eniebler@nvidia.com>
…DIA#4394) * Make compiler version comparisons safer (NVIDIA#4185) * Make compiler version comparisons safer * remove MSVC2017 check * use `_CCCL_HAS_CUDA_COMPILER()` instead of `_CCCL_CUDACC()` * Fix compile issues --------- Co-authored-by: David Bayer <48736217+davebayer@users.noreply.github.com> Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
…IA#4425) * Fix uninitialized read in local atomic code path. * PTX assumes '=' operands are always overwritten. For this code path the predicated mov instruction will only sometimes overwrite the original value. The compiler may or may not initialize `__temp`. This patch fixes this by always signing a 0 or 1 to the output register removing the need to initialize `__temp`. An alternative is to use `+` instead of `=` on the output operand. * Create a test to cover the PTX path of local storage atomics regardless of CTK version * Disable test for nvrtc * Use new additional compile flags * Try and fix checking for compile flags * Update comments in is_local codepath * Make test compatible with older NVCC * Revert "Try and fix checking for compile flags" This reverts commit a846ea5. * Remove unroll pragma, it is unneeded for repro. --------- Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> (cherry picked from commit 3200156) Co-authored-by: Wesley Maxey <71408887+wmaxey@users.noreply.github.com>
…ync` policy (NVIDIA#4204) (NVIDIA#4483) * [Thrust] Perform asynchronous allocations by default for the `par_nosync` policy. This will make algorithms (like scans) that don't have a computation-dependent result but do temporary allocation properly asynchronous under `par_nosync`. * Cleanup * Apply suggestions from code review to `par_nosync` async allocation Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> * Switch from `reinterpret_pointer_cast` to `raw_pointer_cast` when we're going to `void*`. * [Thrust] Pass a raw pointer instead of a Thrust pointer to `cudaFree`. * Run pre-commit. * [Thrust]: Correct comment on `par_nosync` fallback path. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> (cherry picked from commit fbf517d) Co-authored-by: Bryce Adelstein Lelbach aka wash <brycelelbach@gmail.com>
* Drop invalid relative includes. (NVIDIA#4468) They can pull in stale headers * Drop cudax includes too
…A#4492) * Use `cudaStream_t` for `thrust::device.on(...)`. This was recently switched to use `cuda::stream_ref`, which broke users that have their own custom stream wrappers (nvbench, rmm, probably others). There's no real benefit to using a stream_ref here, and it breaks existing implicit conversions. * Add test to ensure that stream wrappers work with thrust::device.on
…IDIA#4516) Co-authored-by: Giannis Gonidelis <gonidelis@hotmail.com>
…NVIDIA#4526) We cannot control the predicate used nor the iterator. For example rapids uses device only predicates a lot (cherry picked from commit a7d76b3) Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
We already dropped it in main
* Avoid deprecated CUDART usage. (NVIDIA#4505) * Disable NVTX tests for NVHPC in C++20 (NVIDIA#4686) The nvtx headers contain a variable named `module` which nvc++ rejects as a c++20 keyword --------- Co-authored-by: Allison Piper <alliepiper16@gmail.com>
…ts (NVIDIA#4594) (NVIDIA#4801) * Switch cuCtxCreate to cuDevicePrimaryCtxRetain * Correct release argument * Set the retained context current (cherry picked from commit cb926fb) Co-authored-by: pciolkosz <pciolkosz@nvidia.com>
…ht architectures (NVIDIA#4440) (NVIDIA#4850)
…mpiler issues (NVIDIA#4586) (NVIDIA#4853) * Always bypass automatic atomic storage checks to prevent potential compiler issues * Waive atomic.local.pass.cpp if `_LIBCUDACXX_ATOMIC_UNSAFE_AUTOMATIC_STORAGE` is set. --------- (cherry picked from commit 549dd45) Co-authored-by: Yunsong Wang <yunsongw@nvidia.com> Co-authored-by: Wesley Maxey <71408887+wmaxey@users.noreply.github.com>
Depending on certain device properties, `cuda::std::addressof` returns a valid device pointer even when called on host. Avoid spurious test failures by checking that if that is the case we get the correct result;
The warning seems bogus, but there is no reason not to work around it.
…A#4990) (NVIDIA#4994) (cherry picked from commit 8c1195a) Co-authored-by: Wesley Maxey <71408887+wmaxey@users.noreply.github.com> Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>
Use a custom memory_resource to touch the first byte of each page
HPX implementation of copy/copy_n
HPX implementation of count/count_if
HPX implementation of equal
HPX implementation of merge
Unwrap contiguous iterator
|
The mere construction of a fork-join executor prevents other executors from making progress. |
a2db469 to
2175af8
Compare
Yes, that's a 'feature' and not a bug ;-) |
In that case any choice of executor should be removed entirely from the backend, as it will always lead to incorrect programs. |
If there is a way to customize the executor in the benchmark without adding it to the backend itself - sure, I'd agree. |
|
This PR makes the For things to work, the
I'm not aware of such a way. |
No description provided.