Skip to content

fix(tasks): global concurrent-task ceiling on the HTTP create path (closes #98)#119

Open
umi-appcoder[bot] wants to merge 1 commit into
mainfrom
fix/concurrent-task-ceiling
Open

fix(tasks): global concurrent-task ceiling on the HTTP create path (closes #98)#119
umi-appcoder[bot] wants to merge 1 commit into
mainfrom
fix/concurrent-task-ceiling

Conversation

@umi-appcoder

@umi-appcoder umi-appcoder Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What

Adds a global soft ceiling on concurrently-live tasks created through the HTTP path, returning 429 when exceeded. Closes #98 — PR 1 of the reliability-hardening series.

Why

The MCP orchestrator caps live sub-agents (KC_MAX_SUBAGENTS), but the HTTP create_task path had no cap. A webhook/cron storm — or a buggy POST loop — could spawn unbounded tmux sessions and OOM/CPU-starve a 2-3 CPU pod.

How

Enforced once, inside ClaudeTaskManager (single source of truth, covers all 5 create callers):

  • MAX_TASKS (env KC_MAX_TASKS, default 12)
  • count_live_tasks() — counts live kube-coder-* tmux sessions
  • at_capacity() / _capacity_rejection()
  • create_task / create_terminal_task refuse at capacity, returning a {status: 'rejected', task_id: null, error} meta without creating a task dir or tmux session.

Every user-facing create handler (dashboard, terminal, desktop action, webhook fire, cron fire) translates a rejected meta to HTTP 429 with a clear message. Webhook/cron paths reject before publishing trigger.fired.

Tests

tests/task_capacity_test.py (7): session-count parsing, at_capacity boundary, rejection payload, create_task/create_terminal_task rejected-at-capacity (no dir left behind), under-capacity passthrough. Also updated two assistant tests that assumed tmux new-session was the first subprocess call (now preceded by the cap's tmux list-sessions). Full Python suite 473, OK.

Closes #98

🤖 Generated with Claude Code

The MCP orchestrator caps live sub-agents (KC_MAX_SUBAGENTS), but the
HTTP create path had no cap — a webhook/cron storm or a buggy POST loop
could spawn unbounded tmux sessions and OOM/CPU-starve a 2-3 CPU pod.

Add a soft ceiling enforced once, inside ClaudeTaskManager:
- MAX_TASKS (env KC_MAX_TASKS, default 12)
- count_live_tasks() — counts live kube-coder-* tmux sessions
- at_capacity() / _capacity_rejection()
- create_task / create_terminal_task refuse at capacity, returning a
  {'status':'rejected', 'task_id':None, 'error':...} meta WITHOUT
  creating a task dir or tmux session.

Every user-facing create handler (dashboard, terminal, desktop action,
webhook fire, cron fire) translates a rejected meta to HTTP 429 with a
clear message. The webhook/cron paths reject before publishing
trigger.fired.

Tests: tests/task_capacity_test.py (7) — count parsing, at_capacity
boundary, rejection payload, create_task/terminal rejected-at-capacity
(no dir left), under-capacity passthrough. Updated two assistant tests
that assumed `tmux new-session` was the first subprocess call (it's now
preceded by the cap's `tmux list-sessions`). Full suite 473, OK.

Closes #98
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(tasks): global concurrent-task ceiling on the HTTP create path

0 participants