-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
When I tried to reproduce the paper's results on an ARM-based Linux system, since flash attention is not supported, I replaced the flash attention with SDPA. Theoretically, the two calculations are equivalent, and the error precision is negligible.. But I was unable to reproduce the effct described in the paper!.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels