Skip to content

Fix Pascal flashprefill multi-arch link#296

Merged
davide221 merged 1 commit into
mainfrom
fix/pascal-multiarch-q4km-draft
May 28, 2026
Merged

Fix Pascal flashprefill multi-arch link#296
davide221 merged 1 commit into
mainfrom
fix/pascal-multiarch-q4km-draft

Conversation

@davide221
Copy link
Copy Markdown
Contributor

Summary

  • keep Pascal scalar FlashPrefill kernels visible across CUDA arch passes so default multi-arch builds do not emit host launchers with missing device symbols
  • use full-warp sync vote/shuffle intrinsics and unique Pascal kernel names to avoid symbol collisions with Volta F16 kernels
  • switch Qwen3.6 DFlash default docs/scripts to the uploaded Q4_K_M draft GGUF

Verification

  • RunPod A40, CUDA 12.4: cmake --build server/build --target dflash_server -j8 reached and passed final dflash_server link
  • local: git diff --check
  • local: bash -n harness/clients/common.sh harness/benchmarks/run_lucebox_vs_llamacpp.sh
  • local: python3 -m py_compile server/scripts/bench_agent.py server/scripts/bench_he.py server/scripts/bench_llm.py

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 13 files

Re-trigger cubic

@davide221 davide221 merged commit 3ba525e into main May 28, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant