Skip to content

fix(launcher): explicit [headroom] route_upstream toggle, default off#84

Merged
mbachaud merged 1 commit into
masterfrom
fix/headroom-route-upstream-toggle
May 12, 2026
Merged

fix(launcher): explicit [headroom] route_upstream toggle, default off#84
mbachaud merged 1 commit into
masterfrom
fix/headroom-route-upstream-toggle

Conversation

@mbachaud
Copy link
Copy Markdown
Owner

Summary

Adds a config-level toggle to disable the launcher's chat-upstream rewriting through the Headroom proxy. Surfaced from operator pain — running with Headroom not installed locally was occasionally producing 30s+ stalls / ECONNREFUSED on chat calls because the launcher was preemptively redirecting helix's upstream to a proxy that hadn't started.

The bug this fixes

_should_route_helix_upstream_via_headroom (launcher/app.py:287) was returning True whenever:

  • cfg.server.upstream parsed cleanly,
  • cfg.server.upstream was not a loopback host (so any remote API),
  • cfg.server.upstream wasn't already pointing at the configured Headroom port.

It did not check cfg.headroom.enabled, nor any "is Headroom actually going to be available?" predicate. So a config like:

[server]
upstream = \"https://api.openai.com/v1\"
[headroom]
enabled = false   # default

with headroom-ai not installed (pip show headroom-ai → not found) would silently:

  1. Set HELIX_SERVER_UPSTREAM = http://127.0.0.1:8787 and OPENAI_TARGET_API_URL = <real upstream> in env (launcher/app.py:327-333).
  2. Skip starting Headroom because is_headroom_installed() returned False at launcher/app.py:363-365.
  3. Start helix as a child with the rewritten env. Helix's load_config then read HELIX_SERVER_UPSTREAM at config.py:616-617 and set cfg.server.upstream = http://127.0.0.1:8787.
  4. Every /v1/chat/completions call dialled a dead local port. ECONNREFUSED with no clear log line connecting the failure to the routing decision.

The existing test test_remote_upstream_routes_helix_via_headroom (tests/test_launcher_app.py:339) pinned this exact buggy behavior — constructed a config with default-disabled headroom + remote upstream and asserted routing was ON.

What changed

  1. New [headroom] route_upstream bool, default False (config.py:HeadroomConfig). Wired through the TOML parser at config.py:783-804. Documented inline in helix.toml.

  2. Routing decision rewritten (launcher/app.py:287) with explicit precedence:

    • auto_override=False (from HELIX_HEADROOM_ROUTE_UPSTREAM_AUTO=0) → off, always.
    • auto_override=True (from HELIX_HEADROOM_ROUTE_UPSTREAM_AUTO=1) → on, even if config says off.
    • auto_override=None (env unset) → defer to cfg.headroom.route_upstream (default off).
    • Existing loopback / parse / double-route guards stay as final filters.
  3. enabled and route_upstream are now separate concerns. enabled controls the proxy lifecycle (start / adopt the process); route_upstream controls whether helix's chat upstream is rewritten to dial the proxy. An operator can now:

    • Run Headroom + dashboard without rerouting chat through it (enabled=true, route_upstream=false).
    • Or vice versa (less common but legal): assume Headroom is externally managed, just rewrite helix's upstream (enabled=false, route_upstream=true).
    • Default install behaves as if neither is set (no proxy lifecycle, no upstream rewriting).
  4. Tests rewritten (tests/test_launcher_app.py:338):

    • test_remote_upstream_does_not_route_when_route_upstream_disabled — the new default-off contract.
    • test_remote_upstream_routes_when_route_upstream_opted_in — explicit opt-in still works.
    • test_env_var_false_forces_off_even_when_config_opted_in — per-launch kill switch.
    • test_env_var_true_forces_on_even_when_config_disabled — per-launch opt-in symmetric.
    • test_local_ollama_upstream_stays_direct — loopback never gets rewritten, regardless.

Per-launch override (immediate workaround, also works post-merge)

$env:HELIX_HEADROOM_ROUTE_UPSTREAM_AUTO = \"0\"
helix-launcher

This was already implemented (_env_truthy at launcher/app.py:275-280); the new code preserves it via the auto_override arg.

What this does NOT change

  • The headroom-ai library import path (headroom_bridge.py / compress_text) is untouched. That surface has correct graceful-degradation already — is_headroom_available() probes the import + the HELIX_DISABLE_HEADROOM env opt-out and falls back to content[:target_chars] truncation when the library isn't installed.
  • The orphan-adopt path in _maybe_build_headroom. Operators with an externally-managed Headroom can still have the launcher adopt it. The decoupling above means orphan adoption no longer implicitly enables upstream rewriting — operators who want that should set route_upstream = true explicitly.

Test plan

  • pytest tests/test_launcher_app.py tests/test_config.py — 54/54 pass.
  • pytest tests/test_launcher_app.py tests/test_launcher_supervisor.py tests/test_headroom_supervisor.py tests/test_config.py tests/test_headroom_bridge.py — 125/125 pass, 7 skipped.
  • Smoke against helix-launcher in a fresh shell to confirm HELIX_SERVER_UPSTREAM is NOT set when route_upstream defaults to false.

Related

  • Builds on the analysis from the headroom-disabled audit; this is the toggle I described in chat. No issue # yet — open one if you want it tracked.

Pre-fix _should_route_helix_upstream_via_headroom returned True for any
remote (non-loopback) upstream as long as HELIX_HEADROOM_ROUTE_UPSTREAM_AUTO
wasn't explicitly set falsy. So an operator with
  cfg.server.upstream  = "https://api.openai.com/v1"
  cfg.headroom.enabled = false           (default!)
  cfg.headroom.installed = no            (no headroom-ai package)
would have the launcher rewrite HELIX_SERVER_UPSTREAM to
http://127.0.0.1:8787 and start helix pointing at a Headroom proxy that
was never started — every chat call then failed with ECONNREFUSED, with
no clear diagnostic line tying it back to the routing decision.

Adds a separate [headroom] route_upstream bool (default false) gating
the rewrite. enabled / route_upstream are distinct concerns: lifecycle
vs chat-redirect. Operators can now run the proxy + dashboard without
the chat redirect, or explicitly opt the redirect in without changing
the proxy lifecycle.

HELIX_HEADROOM_ROUTE_UPSTREAM_AUTO remains as a per-launch override.
Precedence:
  1. auto_override=False (env=0)  -> off
  2. auto_override=True  (env=1)  -> on (even with route_upstream=false)
  3. auto_override=None  (unset)  -> defer to cfg.headroom.route_upstream

The previous test that pinned the bug
(test_remote_upstream_routes_helix_via_headroom) is replaced with four
tests covering the new precedence rules + the loopback-stays-direct
invariant.

Tests: 125 passed / 7 skipped across launcher + config + headroom test
modules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mbachaud mbachaud force-pushed the fix/headroom-route-upstream-toggle branch from 2ee5699 to c5d9eb8 Compare May 12, 2026 23:00
@mbachaud mbachaud merged commit e0e7e24 into master May 12, 2026
3 checks passed
@mbachaud mbachaud deleted the fix/headroom-route-upstream-toggle branch May 12, 2026 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant