seglen eviction for mamba radix cache by XinyiQiao · Pull Request #3 · abdelfattah-lab/sglang

XinyiQiao · 2026-03-25T20:41:23Z

Motivation

This PR adds a new radix cache eviction policy "seglen" (segment length) for hybrid models using MambaRadixCache.

Our approach is inspired by Marconi prefix caching for hybrid LLMs. Seglen heuristically approximates Marconi’s FLOPs-efficiency score, preserving the core recomputation-cost intuition while reducing implementation complexity of model-architecture specific marginal FLOPs calculations.

seglen ranks eviction candidates using replay length to the nearest reusable Mamba ancestor, combined with recency. Compared with pure LRU, this is intended to make eviction decisions more aware of recomputation cost for hybrid models.

sglang serve \
  --model-path Qwen/Qwen3.5-9B \
  --mamba-scheduler-strategy extra_buffer \
  --radix-eviction-policy seglen \
  --marconi-eff-weight 0.85

Modifications

Add seglen as a supported radix eviction policy for hybrid SSM models
Implement seglen full-KV eviction and Mamba-state eviction in MambaRadixCache.
Add seglen_eff_weight to control the balance between replay-length efficiency and recency.
Update match_prefix behavior so seglen refreshes only the matched last node instead of all matched ancestors.
Add validation so --radix-eviction-policy=seglen is only allowed for hybrid SSM models.

Benchmarking and Profiling

Benchmark results on H100 show that seglen delivers substantial TTFT improvements on prefix-heavy workloads, while still providing a modest TTFT improvement on the ShareGPT regression dataset with low prefix-hit rate.

-29.5% TTFT on prefix-heavy datasets
-26.1% TTFT on SWE-bench datasets
-3.4% TTFT on ShareGPT as a regression check (~1% prefix hit)

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

XinyiQiao added 30 commits March 3, 2026 11:51

MARCONI RFG

803970d

marconi RFC

3b618df

update RFC

16b14e4

marconi eviction scoring

6ff7f92

fix test

fab99e5

fix tests

e5867f9

fix bug

bf82a63

add ds

0473747

add log

c1c572c

update ds

b66f140

update ds

2131337

rank and sort once

459ddc8

fix marconi flop utils

7b8c9f2

fix tests

e44d62e

updage ds

7fdafa2

use smaller ds

6d20eb7

revert ds

152bfab

capture ssm state

2f24b08

add log

1f234b0

dummy change

7830ca2

add logs

a035486

remove dummy change

21024e2

fix cuda capturing error

fa3d1f3

remove padding

575a744

remove padding

5d4387b

fix cuda error

22e3272

remove padding

5da06cb

add logging

718a94f

remove logs

07f0570

remove arg

3f57524

XinyiQiao added 11 commits March 18, 2026 10:46

update RFC

529cc1a

Merge remote-tracking branch 'origin/main' into marconi-integration

2cfb13a

add seglen

93f5e5d

add multigroup ds

3042c6a

marconi utility

787aeca

last access time

fc1556b

add benchmark script

d6db848

add small ds

61f8fb0

update script

4c5bf76

seglen cleanup

d54f146

marconi cleanup

6e0f647

github-actions Bot added the documentation Improvements or additions to documentation label Mar 25, 2026

XinyiQiao added 9 commits March 25, 2026 16:48

delete files

9f254a3

update comment

5ef2db6

update efficiency rank

59b094b

update import

9923ac6

update comment

dd63bd6

fix evict full under eviction

c9f8a63

fix under eviction for mamba

48dc73e

use min to pick best candidate

80b5589

add unit test

8103b79

XinyiQiao changed the title ~~Seglen eviction~~ seglen eviction for mamba radix cache Mar 29, 2026

fix precommit

39fba05

XinyiQiao added enhancement New feature or request and removed documentation Improvements or additions to documentation labels Mar 29, 2026

XinyiQiao marked this pull request as ready for review March 29, 2026 22:11

XinyiQiao added 4 commits April 5, 2026 22:57

add tests

ee92383

remove log

ac10f53

Merge branch 'main' into seglen-eviction

eb5dd65

addr comments

606177d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seglen eviction for mamba radix cache#3

seglen eviction for mamba radix cache#3
XinyiQiao wants to merge 57 commits into
mainfrom
seglen-eviction

XinyiQiao commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

XinyiQiao commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

XinyiQiao commented Mar 25, 2026 •

edited

Loading