Skip to content

Security + multi-tenant: API_AUTH_TOKEN auth, POST /stop, and tenant metrics#95

Open
cbaugus wants to merge 6 commits intomainfrom
dev
Open

Security + multi-tenant: API_AUTH_TOKEN auth, POST /stop, and tenant metrics#95
cbaugus wants to merge 6 commits intomainfrom
dev

Conversation

@cbaugus
Copy link
Owner

@cbaugus cbaugus commented Mar 3, 2026

Summary

  • API_AUTH_TOKEN bearer token auth on POST /config and POST /stop — when the env var is set both endpoints require Authorization: Bearer <token>; 401 on mismatch. Backwards-compatible: when unset, endpoints are open (closes Security: Authenticate POST /config endpoint #91)
  • POST /stop endpoint — stops all workers, transitions node to idle, returns JSON summary (closes Add POST /stop endpoint for test cancellation #93)
  • Optional tenant label on Prometheus metrics — set metadata.tenant in the YAML config and every request metric gets a tenant label so per-tenant request counts are queryable for billing
  • Tenant-scoped stopPOST /stop {"tenant": "acme"} returns 409 Conflict if a different tenant's test is active, preventing one client from stopping another's test
  • Nomad HCL job files and Consul KV YAML config example added

What changed

src/yaml_config.rs

  • YamlMetadata gains pub tenant: Option<String> — optional, fully backwards-compatible

src/metrics.rs

Five metrics now carry a "tenant" label alongside "region":

  • REQUEST_TOTAL — primary billing counter
  • REQUEST_STATUS_CODES
  • CONCURRENT_REQUESTS
  • REQUEST_DURATION_SECONDS
  • REQUEST_ERRORS_BY_CATEGORY
  • SCENARIO_REQUESTS_TOTAL

src/worker.rs

  • WorkerConfig and ScenarioWorkerConfig gain pub tenant: String
  • All metric recording call sites updated to pass tenant

src/main.rs

  • TestState tracks tenant: Option<String> for the active run
  • Config-watcher extracts tenant from YAML and passes it to workers
  • Metrics updater reads active tenant from TestState; resets RPS delta on tenant change
  • GET /health exposes "tenant": null | "acme"
  • POST /config — auth check before body consumption
  • POST /stop — optional {"tenant": "..."} body; 409 if tenant mismatch; clears tenant on stop

Auth behaviour

API_AUTH_TOKEN set? Request has correct header? Result
No N/A Open
Yes Yes Proceeds
Yes No / missing 401

POST /stop tenant scoping

Body Active tenant Result
(empty) any Stop all workers
{"tenant":"acme"} "acme" Stop
{"tenant":"acme"} "other" or none 409 Conflict

YAML example

metadata:
  name: "API smoke test"
  tenant: "acme-corp"     # optional — adds tenant label to all metrics
config:
  baseUrl: "https://api.acme.com"
  workers: 50
  duration: "30m"
load:
  model: "rps"
  target: 500

Prometheus query for billing

# Total requests for a specific tenant
sum(rust_loadtest_requests_total{tenant="acme-corp"})

# Per-tenant RPS
sum(rate(rust_loadtest_requests_total{tenant="acme-corp"}[1m]))

Test plan

  • CI passes on dev
  • POST /config with tenant YAML → GET /health shows "tenant": "acme"
  • Prometheus shows tenant="acme" label on requests_total
  • POST /stop {"tenant":"acme"} stops correct test
  • POST /stop {"tenant":"other"} returns 409 when different tenant active
  • POST /stop (no body) stops regardless of tenant

Related issues

🤖 Generated with Claude Code

cbaugus and others added 2 commits March 3, 2026 13:54
- POST /config and POST /stop now check Authorization: Bearer <token>
  when API_AUTH_TOKEN env var is set; returns 401 if missing/invalid.
  Fully backwards-compatible: when unset, endpoints remain open.
- POST /stop sends stop signal to all workers, aborts handles, and
  transitions node_state to "idle". Returns JSON summary with last
  known RPS and worker count.
- Updated help text to document API_AUTH_TOKEN and POST /stop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- YamlMetadata gains an optional `tenant` field (metadata.tenant in YAML).
  Backwards-compatible: omit the field and behaviour is unchanged.
- Five Prometheus metrics now carry a `tenant` label alongside `region`:
  REQUEST_TOTAL, REQUEST_STATUS_CODES, CONCURRENT_REQUESTS,
  REQUEST_DURATION_SECONDS, REQUEST_ERRORS_BY_CATEGORY,
  SCENARIO_REQUESTS_TOTAL.  Empty string when no tenant is set.
- WorkerConfig and ScenarioWorkerConfig gain a `tenant: String` field
  threaded through all worker-spawning sites (config-watcher, startup,
  standby, scenario paths) and all metric recording call sites.
- TestState tracks the active tenant; GET /health exposes it as
  `"tenant": null | "acme"` so the web layer can see who owns the node.
- POST /stop accepts an optional JSON body `{"tenant": "acme"}`.
  When supplied, the endpoint returns 409 Conflict if the active test
  belongs to a different tenant, preventing one client from stopping
  another client's test.  Omit the body to stop unconditionally.
- Metrics updater resets RPS delta tracking when the active tenant
  changes, preventing phantom RPS spikes at test boundaries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cbaugus cbaugus changed the title Security: API_AUTH_TOKEN auth + POST /stop endpoint Security + multi-tenant: API_AUTH_TOKEN auth, POST /stop, and tenant metrics Mar 3, 2026
cbaugus and others added 4 commits March 3, 2026 14:38
…olations

- Move `new_tenant` extraction before worker spawning block so all
  three ScenarioWorkerConfig/WorkerConfig sites can reference it
- Collapse rustfmt two-line splits: current_tenant, active, curr_requests,
  and tenant:.clone().unwrap_or_default() in struct literal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Split REQUEST_TOTAL chained calls in worker.rs to match rustfmt style
- Reformat body_bytes/stop_tenant block and curr_requests chain in main.rs
  to match rustfmt's expected indentation
- Replace test_realistic_user_pool httpbin.org dependency with wiremock
  mock server to eliminate CI flakiness from live endpoint unavailability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 4 metric helpers used the old label arity after tenant was added.
Workers write with &["local", ""] but reads used &["local"] causing
InconsistentCardinality panics in prometheus.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_mixed_methods_scenario: mock all 4 methods locally; fix the
  buggy assertion that panicked on out-of-bounds index when fewer
  than 4 steps were returned (change OR to assert_eq!(len, 4))
- test_case_insensitive_methods: mock GET /get and POST /post locally
  so the case-folding logic in executor.rs is tested without a live
  network dependency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add POST /stop endpoint for test cancellation Security: Authenticate POST /config endpoint

1 participant