Refactor model-call request control into RequestAdmissionController

## Priority Level

High

Epic: #645
Depends on: #644, #654
Feeds: #635
Related: #641, #646, #647, #648, #649, #650, #651
Target branch: `epic/645-async-scheduling` while the epic is active.

## Source of Truth

Specification details live in PR #658 under `plans/645/`, especially `request-admission.md`, `contracts.md`, `capacity-model.md`, `migration-and-cleanup.md`, `benchmark-plan.md`, and `issue-map.md`. This issue tracks the implementation slice and gates only.

## Implementation Scope

Refactor model-call throttling into the plan-defined request-admission layer.

This issue owns:

- `ModelRequestExecutor` as the durable model-call boundary.
- `RequestAdmissionController` and concrete V1 `AdaptiveRequestAdmissionController`.
- Request resource/domain DTOs, request leases, request decisions, read-only pressure snapshots, and internal request queue/policy/state components described by the plan.
- Lease-based acquire/release for concrete provider/model/domain calls, including dynamic zero/one/many calls from arbitrary generator Python.
- Rename/removal of durable `ThrottleManager`, `ThrottleDomain`, `ThrottleConfig`, `RunConfig.throttle`, and `throttle_manager.py` production surfaces.
- Request-admission configuration naming aligned with #654.
- Benchmark hooks/evidence for request-admission overhead and dynamic-call workloads.

This issue must not make task admission reserve future model calls or move DAG scheduling into request admission.

## Quality Gates

- Request admission and task admission remain separate layers with separate units of scheduling.
- `ModelRequestExecutor` maps concrete model requests to request resource keys and releases the exact acquired lease on success, rate limit, non-rate-limit failure, timeout, cancellation, and unexpected exception paths.
- No second public wrapper or double-admission queue is introduced around the existing AIMD behavior.
- `RequestPressureSnapshotProvider` is read-only and has no mutation/admission methods.
- Tests cover request queue ordering, AIMD transitions, exact-once release, no permit leaks, cancellation, timeouts, rate-limit cooldown, non-rate-limit failures, dynamic zero/one/many request generators, and no bypass of request fairness.
- Production/current docs have no durable legacy throttle names when this issue closes.

## Validation

Run focused request-admission/model-client tests, async generator integration tests with dynamic request counts, `make check`, benchmark evidence required by `plans/645/benchmark-plan.md`, and stale-term searches for `ThrottleManager`, `ThrottleDomain`, `ThrottleConfig`, `RunConfig.throttle`, and `throttle_manager.py`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor model-call request control into RequestAdmissionController #657

Priority Level

Source of Truth

Implementation Scope

Quality Gates

Validation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Refactor model-call request control into RequestAdmissionController #657

Description

Priority Level

Source of Truth

Implementation Scope

Quality Gates

Validation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions