llama-finetune broken on modern transformers: cascading failures in SET_ROWS, FLASH_ATTN_EXT, and graph allocation

## Summary

`llama-finetune` fails on all modern transformer architectures (LLaMA 3.2, Qwen2.5, SmolLM2) due to three interconnected issues in the backward pass infrastructure. This affects builds b7405 through b7717 (current master).

  ## Environment
  - **Build**: b7717 (537d4240d), Linux x86_64, GCC 13.3.0
  - **Model**: `Llama-3.2-1B-Instruct-f32.gguf`

  ## Failures

  | # | Trigger | Location | Cause |
  |---|---------|----------|-------|
  | 1 | Default | `ggml.c:6928` | `GGML_OP_SET_ROWS` not in allowed view-ops, no backward impl |
  | 2 | `--flash-attn` | `ggml.c:ggml_compute_backward()` | `GGML_OP_FLASH_ATTN_EXT` missing backward case |
  | 3 | Disable FA | `ggml.c:6763` | Graph node overflow from standard attention |

  ## Reproduction
  ```bash
  ./build/bin/llama-finetune \
    -m Llama-3.2-1B-Instruct-f32.gguf \
    -f train.txt -c 256 --epochs 1

  Related

  - #15279: Same view_src assertion (closed COMPLETED)
  - #15090: Same graph overflow (closed STALE)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-finetune broken on modern transformers: cascading failures in SET_ROWS, FLASH_ATTN_EXT, and graph allocation #18805

Summary

Environment

Failures

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

#	Trigger	Location	Cause
1	Default	`ggml.c:6928`	`GGML_OP_SET_ROWS` not in allowed view-ops, no backward impl
2	`--flash-attn`	`ggml.c:ggml_compute_backward()`	`GGML_OP_FLASH_ATTN_EXT` missing backward case
3	Disable FA	`ggml.c:6763`	Graph node overflow from standard attention

llama-finetune broken on modern transformers: cascading failures in SET_ROWS, FLASH_ATTN_EXT, and graph allocation #18805

Description

Summary

Environment

Failures

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions