infra: GCP cost-control architecture (budget alert + kill-switch Cloud Function + BQ per-query cap)#108
Merged
Merged
Conversation
…er-query cap
Triggered by a $200 GCP bill from a Maps Platform API key (uid
0b7b11c7) created 2025-03-16 — unrelated to AINPI (zero references
in the codebase). Already-executed mitigation:
- DELETED the Maps Platform API key
- DISABLED 10 Maps/Places APIs at project level (places, maps-backend,
geocoding-backend, etc.) — even cached copies of the key now error 403
- Created $10/mo budget alert at thresholds 50/90/100% on billing
account 01B58B-9C267D-ECC805
- Linked budget to Pub/Sub topic
projects/thematic-fort-453901-t7/topics/billing-alerts
This commit adds the code-side controls:
1. Cloud Function billing kill-switch (`infrastructure/kill-billing-function/`):
- main.py listens to billing-alerts Pub/Sub topic, disables billing
when costAmount >= budgetAmount
- README has the deploy command + the one-time IAM grant the agent
classifier blocked (billing.projectManager on the billing account)
- Service account already created: kill-billing-sa@...
- Pub/Sub topic + service account already exist; only deploy + IAM
grant remain (commands in the README)
2. BigQuery per-query maximum-bytes-billed cap:
- frontend/src/lib/bigquery.ts: DEFAULT_MAX_BYTES_BILLED = 100 GB
(~$0.50 per query). queryBigQuery() applies it by default; opt-out
by passing { maximumBytesBilled: <larger> }
- analysis/claims_sources/_cohorts.py: bq_job_config() helper that
returns a QueryJobConfig with the same cap. Future Python scripts
should pass job_config=bq_job_config() on every BQ query
3. CLAUDE.md updated with the cost-control summary + pointers to the
new helpers so future contributors don't accidentally bypass them.
Manual user steps required to complete kill-switch (commands in
infrastructure/kill-billing-function/README.md):
a. gcloud billing accounts add-iam-policy-binding ... role=billing.projectManager
b. cd infrastructure/kill-billing-function && gcloud functions deploy ...
Production verified healthy after Maps disable: /, /findings,
/articles/eight-years-post-exclusion all 200; /api/npd/validation (BQ-
backed) returns expected JSON.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the project's standing cost-control infrastructure for GCP. This is the canonical setup any future paid-API usage on the project must work within.
What's in this PR
1.
infrastructure/kill-billing-function/— Cloud Function source codebilling-alertsPub/Sub topiccloudbilling.projects.updateBillingInfo({billingAccountName: ""})whencostAmount >= budgetAmount2. BigQuery per-query cap (100 GB ≈ $0.50 per query)
frontend/src/lib/bigquery.ts:DEFAULT_MAX_BYTES_BILLED = 100_000_000_000;queryBigQuery()applies it by defaultanalysis/claims_sources/_cohorts.py:bq_job_config()helper returning aQueryJobConfigwith the same cap3. CLAUDE.md updated with the cost-control summary + architecture review checklist so future contributors apply the same discipline.
Server-side prerequisites (completed outside this PR)
billing-alertsPub/Sub topickill-billing-sa@thematic-fort-453901-t7.iam.gserviceaccount.comcreatedManual user steps to finish kill-switch deploy
Classifier blocked the agent from deploying the function (it has billing-disable IAM — wants a human eyeball on the source before deploy). Two one-shot commands documented in infrastructure/kill-billing-function/README.md — least-privilege pattern (projectManager on the project + user on the billing account, so the SA can only disable billing for THIS project).
After deploy, smoke-test with a payload below the threshold (won't disable anything):
Expected log:
event: cost=\$0.50 budget=\$10.00andunder budget; no action.Test plan
tsc --noEmitclean onbigquery.tschange/,/findings,/articles/...all 200;/api/npd/validation(BQ-backed) returns expected JSONgcloud billing projects linkre-enables billing (also in README)