Skip to content

Cap SM120 forward kernels to consumer shared memory#14

Merged
ivandobskygithub merged 1 commit intomainfrom
codex/fix-flashattention-shared-memory-error-d38dl5
Nov 25, 2025
Merged

Cap SM120 forward kernels to consumer shared memory#14
ivandobskygithub merged 1 commit intomainfrom
codex/fix-flashattention-shared-memory-error-d38dl5

Conversation

@ivandobskygithub
Copy link
Owner

Summary

  • reduce SM120 forward mainloop stages to one to shrink shared-memory usage on consumer Blackwell GPUs
  • add a static assertion to block SM120 instantiations that exceed the ~100KB shared-memory budget

Testing

  • pytest tests/hopper/test_tile_size_shared_memory.py -q

Codex Task

@ivandobskygithub ivandobskygithub merged commit e3c646c into main Nov 25, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant