fix(tasks): global concurrent-task ceiling on the HTTP create path (closes #98)#119
Open
umi-appcoder[bot] wants to merge 1 commit into
Open
fix(tasks): global concurrent-task ceiling on the HTTP create path (closes #98)#119umi-appcoder[bot] wants to merge 1 commit into
umi-appcoder[bot] wants to merge 1 commit into
Conversation
The MCP orchestrator caps live sub-agents (KC_MAX_SUBAGENTS), but the
HTTP create path had no cap — a webhook/cron storm or a buggy POST loop
could spawn unbounded tmux sessions and OOM/CPU-starve a 2-3 CPU pod.
Add a soft ceiling enforced once, inside ClaudeTaskManager:
- MAX_TASKS (env KC_MAX_TASKS, default 12)
- count_live_tasks() — counts live kube-coder-* tmux sessions
- at_capacity() / _capacity_rejection()
- create_task / create_terminal_task refuse at capacity, returning a
{'status':'rejected', 'task_id':None, 'error':...} meta WITHOUT
creating a task dir or tmux session.
Every user-facing create handler (dashboard, terminal, desktop action,
webhook fire, cron fire) translates a rejected meta to HTTP 429 with a
clear message. The webhook/cron paths reject before publishing
trigger.fired.
Tests: tests/task_capacity_test.py (7) — count parsing, at_capacity
boundary, rejection payload, create_task/terminal rejected-at-capacity
(no dir left), under-capacity passthrough. Updated two assistant tests
that assumed `tmux new-session` was the first subprocess call (it's now
preceded by the cap's `tmux list-sessions`). Full suite 473, OK.
Closes #98
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a global soft ceiling on concurrently-live tasks created through the HTTP path, returning 429 when exceeded. Closes #98 — PR 1 of the reliability-hardening series.
Why
The MCP orchestrator caps live sub-agents (
KC_MAX_SUBAGENTS), but the HTTPcreate_taskpath had no cap. A webhook/cron storm — or a buggy POST loop — could spawn unbounded tmux sessions and OOM/CPU-starve a 2-3 CPU pod.How
Enforced once, inside
ClaudeTaskManager(single source of truth, covers all 5 create callers):MAX_TASKS(envKC_MAX_TASKS, default 12)count_live_tasks()— counts livekube-coder-*tmux sessionsat_capacity()/_capacity_rejection()create_task/create_terminal_taskrefuse at capacity, returning a{status: 'rejected', task_id: null, error}meta without creating a task dir or tmux session.Every user-facing create handler (dashboard, terminal, desktop action, webhook fire, cron fire) translates a rejected meta to HTTP 429 with a clear message. Webhook/cron paths reject before publishing
trigger.fired.Tests
tests/task_capacity_test.py(7): session-count parsing,at_capacityboundary, rejection payload,create_task/create_terminal_taskrejected-at-capacity (no dir left behind), under-capacity passthrough. Also updated two assistant tests that assumedtmux new-sessionwas the first subprocess call (now preceded by the cap'stmux list-sessions). Full Python suite 473, OK.Closes #98
🤖 Generated with Claude Code