Skip to content

[CK_TILE] Use Unified Workspace for FMHA BWD#182

Open
DDEle wants to merge 1 commit intock_improve_mainfrom
yiding12/fmha-bwd-workspace
Open

[CK_TILE] Use Unified Workspace for FMHA BWD#182
DDEle wants to merge 1 commit intock_improve_mainfrom
yiding12/fmha-bwd-workspace

Conversation

@DDEle
Copy link
Copy Markdown
Collaborator

@DDEle DDEle commented Apr 23, 2026

Bump composable_kernel submodule to mono-split/users/yiding12/fmha-bwd-workspace
HEAD and adapt the FMHA BWD host wrappers to the new unified workspace API:

  • Replace dq_acc tensor argument with workspace_ptr in get_ck_fmha_bwd_args
    / get_ck_fmha_varlen_bwd_args
  • Drop dq_acc strides that have been removed from fmha_bwd_args
  • In mha_bwd / mha_varlen_bwd, allocate the device workspace based on
    fmha_bwd_launcher::workspace_size and call launcher.prepare_workspace()
  • Invoke launcher.run(args, stream_config) instead of fmha_bwd(...)

Bump composable_kernel submodule to mono-split/users/yiding12/fmha-bwd-workspace
HEAD and adapt the FMHA BWD host wrappers to the new unified workspace API:

- Replace dq_acc tensor argument with workspace_ptr in get_ck_fmha_bwd_args
  / get_ck_fmha_varlen_bwd_args
- Drop dq_acc strides that have been removed from fmha_bwd_args
- In mha_bwd / mha_varlen_bwd, allocate the device workspace based on
  fmha_bwd_launcher::workspace_size and call launcher.prepare_workspace()
- Invoke launcher.run(args, stream_config) instead of fmha_bwd(...)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant