You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Specification details live in PR #658 under plans/645/, especially request-admission.md, contracts.md, capacity-model.md, migration-and-cleanup.md, benchmark-plan.md, and issue-map.md. This issue tracks the implementation slice and gates only.
Implementation Scope
Refactor model-call throttling into the plan-defined request-admission layer.
This issue owns:
ModelRequestExecutor as the durable model-call boundary.
RequestAdmissionController and concrete V1 AdaptiveRequestAdmissionController.
Request resource/domain DTOs, request leases, request decisions, read-only pressure snapshots, and internal request queue/policy/state components described by the plan.
Lease-based acquire/release for concrete provider/model/domain calls, including dynamic zero/one/many calls from arbitrary generator Python.
Rename/removal of durable ThrottleManager, ThrottleDomain, ThrottleConfig, RunConfig.throttle, and throttle_manager.py production surfaces.
Benchmark hooks/evidence for request-admission overhead and dynamic-call workloads.
This issue must not make task admission reserve future model calls or move DAG scheduling into request admission.
Quality Gates
Request admission and task admission remain separate layers with separate units of scheduling.
ModelRequestExecutor maps concrete model requests to request resource keys and releases the exact acquired lease on success, rate limit, non-rate-limit failure, timeout, cancellation, and unexpected exception paths.
No second public wrapper or double-admission queue is introduced around the existing AIMD behavior.
RequestPressureSnapshotProvider is read-only and has no mutation/admission methods.
Tests cover request queue ordering, AIMD transitions, exact-once release, no permit leaks, cancellation, timeouts, rate-limit cooldown, non-rate-limit failures, dynamic zero/one/many request generators, and no bypass of request fairness.
Production/current docs have no durable legacy throttle names when this issue closes.
Validation
Run focused request-admission/model-client tests, async generator integration tests with dynamic request counts, make check, benchmark evidence required by plans/645/benchmark-plan.md, and stale-term searches for ThrottleManager, ThrottleDomain, ThrottleConfig, RunConfig.throttle, and throttle_manager.py.
Priority Level
High
Epic: #645
Depends on: #644, #654
Feeds: #635
Related: #641, #646, #647, #648, #649, #650, #651
Target branch:
epic/645-async-schedulingwhile the epic is active.Source of Truth
Specification details live in PR #658 under
plans/645/, especiallyrequest-admission.md,contracts.md,capacity-model.md,migration-and-cleanup.md,benchmark-plan.md, andissue-map.md. This issue tracks the implementation slice and gates only.Implementation Scope
Refactor model-call throttling into the plan-defined request-admission layer.
This issue owns:
ModelRequestExecutoras the durable model-call boundary.RequestAdmissionControllerand concrete V1AdaptiveRequestAdmissionController.ThrottleManager,ThrottleDomain,ThrottleConfig,RunConfig.throttle, andthrottle_manager.pyproduction surfaces.This issue must not make task admission reserve future model calls or move DAG scheduling into request admission.
Quality Gates
ModelRequestExecutormaps concrete model requests to request resource keys and releases the exact acquired lease on success, rate limit, non-rate-limit failure, timeout, cancellation, and unexpected exception paths.RequestPressureSnapshotProvideris read-only and has no mutation/admission methods.Validation
Run focused request-admission/model-client tests, async generator integration tests with dynamic request counts,
make check, benchmark evidence required byplans/645/benchmark-plan.md, and stale-term searches forThrottleManager,ThrottleDomain,ThrottleConfig,RunConfig.throttle, andthrottle_manager.py.