Skip to content

Refactor model-call request control into RequestAdmissionController #657

@eric-tramel

Description

@eric-tramel

Priority Level

High

Epic: #645
Depends on: #644, #654
Feeds: #635
Related: #641, #646, #647, #648, #649, #650, #651
Target branch: epic/645-async-scheduling while the epic is active.

Source of Truth

Specification details live in PR #658 under plans/645/, especially request-admission.md, contracts.md, capacity-model.md, migration-and-cleanup.md, benchmark-plan.md, and issue-map.md. This issue tracks the implementation slice and gates only.

Implementation Scope

Refactor model-call throttling into the plan-defined request-admission layer.

This issue owns:

  • ModelRequestExecutor as the durable model-call boundary.
  • RequestAdmissionController and concrete V1 AdaptiveRequestAdmissionController.
  • Request resource/domain DTOs, request leases, request decisions, read-only pressure snapshots, and internal request queue/policy/state components described by the plan.
  • Lease-based acquire/release for concrete provider/model/domain calls, including dynamic zero/one/many calls from arbitrary generator Python.
  • Rename/removal of durable ThrottleManager, ThrottleDomain, ThrottleConfig, RunConfig.throttle, and throttle_manager.py production surfaces.
  • Request-admission configuration naming aligned with Implement async capacity model and runtime snapshots #654.
  • Benchmark hooks/evidence for request-admission overhead and dynamic-call workloads.

This issue must not make task admission reserve future model calls or move DAG scheduling into request admission.

Quality Gates

  • Request admission and task admission remain separate layers with separate units of scheduling.
  • ModelRequestExecutor maps concrete model requests to request resource keys and releases the exact acquired lease on success, rate limit, non-rate-limit failure, timeout, cancellation, and unexpected exception paths.
  • No second public wrapper or double-admission queue is introduced around the existing AIMD behavior.
  • RequestPressureSnapshotProvider is read-only and has no mutation/admission methods.
  • Tests cover request queue ordering, AIMD transitions, exact-once release, no permit leaks, cancellation, timeouts, rate-limit cooldown, non-rate-limit failures, dynamic zero/one/many request generators, and no bypass of request fairness.
  • Production/current docs have no durable legacy throttle names when this issue closes.

Validation

Run focused request-admission/model-client tests, async generator integration tests with dynamic request counts, make check, benchmark evidence required by plans/645/benchmark-plan.md, and stale-term searches for ThrottleManager, ThrottleDomain, ThrottleConfig, RunConfig.throttle, and throttle_manager.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    planAgent-assisted development plantaskInternal development task

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions