Skip to content

Unify gapp setup and gapp ci setup under a single CLI surface #44

@krisrowe

Description

@krisrowe

Unify gapp setup and gapp ci setup under a single CLI surface

Problem

The bootstrap flow for a new project today requires two local commands run in a specific order, enforced only by an error message:

gapp init                            # local — write gapp.yaml
gapp setup --project PROJECT_ID      # local, as Owner — APIs, bucket, label
gapp ci setup                        # local, as Owner — WIF, deploy SA, workflow

If a user runs gapp ci setup before gapp setup, setup_ci() fails at the project-resolution step (gapp/admin/sdk/ci.py:614-617) with the error:

No GCP project found for '': ...
Run 'gapp setup --project ' first.

That error message is the only mechanism enforcing the ordering. The two commands have no code-level coupling — gapp ci setup does not invoke setup() internally.

What this design gets right

  • Separation of concerns is clean: gapp setup provisions the GCP foundation (APIs, bucket, labels, build SA perms). gapp ci setup provisions the CI bootstrap (WIF pool/provider, deploy SA, IAM bindings, GitHub workflow file). These are genuinely different concerns.
  • Idempotency works in both directions. gapp setup re-runs harmlessly on every CI deploy. gapp ci setup can be re-run to add a binding or refresh the workflow file.
  • CI's gapp setup runs as the deploy SA with intentionally narrow IAM. enable_api() silently no-ops on PERMISSION_DENIED because the deploy SA doesn't have serviceusage.serviceUsageAdmin (broad and dangerous). The local-Owner-first pattern is the deliberate way to keep CI's blast radius small while still letting the framework own API enablement.

Where it gets uncomfortable

The "labels must exist before ci setup runs" dependency is documented in the error message. But there are hidden dependencies the message doesn't surface:

  1. API enablement: gapp ci setup calls _get_project_number() which runs gcloud projects describe, which requires cloudresourcemanager.googleapis.com to be enabled on the target project. This API is only enabled when gapp setup runs (it's in the foundation API list). If gapp ci setup is run on a project where the API hasn't been enabled, the failure is opaque (gcloud returns a non-zero exit, and the wrapping subprocess exception doesn't preserve stderr at the call site).

  2. Bucket for terraform state: gapp setup creates the GCS bucket that gapp deploy later uses for terraform state. gapp ci setup doesn't need it itself, but the subsequent CI deploy will fail without it.

  3. Build SA permissions: ensure_build_permissions() runs during gapp setup to grant the Cloud Build SA the perms it needs. CI deploys need those grants in place.

So the implicit contract is bigger than "labels first" — it's really "all the foundation gapp setup provides, first." The user has no way to discover this from the error message alone; they just have to know to run gapp setup first.

Proposed direction

Collapse the two commands into a single CLI surface, with CI provisioning as an opt-in scope:

gapp setup --project PROJECT_ID            # foundation only (current `gapp setup` behavior)
gapp setup --project PROJECT_ID --ci       # foundation + CI bootstrap

The --ci path runs everything setup does today, then layers the CI-specific provisioning (WIF, deploy SA, IAM, workflow) on top. Internally the logic stays modular (functions can still be named setup_foundation / setup_ci and called in sequence), but the user-facing surface becomes one command with one clear ordering: there's no second command to forget, and no error message playing the role of contract enforcer.

This also addresses the hidden-dependency leak: any prerequisite the CI bootstrap needs (current or future) is automatically satisfied because foundation setup always runs first under --ci.

Why --ci as a flag rather than a positional or subcommand

  • A subcommand (gapp setup ci) reads as a sub-operation of setup, which is fine, but it still encourages the mental model of "two steps." A flag reads as "same operation, broader scope."
  • A positional arg conflates with --project semantics and is less discoverable.
  • Other flags (--env, --force) already exist; --ci slots into the same pattern.

Open question: whether --ci should require explicit opt-in or default-on-when-detected (e.g., if a CI repo is configured via gapp ci init). Default-off is safer.

Tradeoffs

  • Breaking CLI change. Any existing caller of gapp ci setup (scripts, docs, agent skills) needs to migrate to gapp setup --ci. The error message from the deprecated gapp ci setup command can guide users for one major-version cycle, then be removed.
  • Documentation and skill updates. All references to gapp ci setup in docs, skill args: descriptions, agent-context files, and the README would need updating.
  • Refactoring scope. setup_ci() in gapp/admin/sdk/ci.py calls into GappSDK for project resolution and re-uses naming conventions. The actual extraction into a setup(..., ci=False) form on GappSDK is straightforward — the function bodies already exist and can be composed.
  • Not version-pegged. This proposal isn't tied to any specific gapp release. The right home depends on what other breaking changes are coalescing — could ship standalone in the next major, or batched with other CLI-surface cleanups when there's a forcing function.

Alternative considered: leave the split, improve the error message

Add the API enablement and bucket dependencies to the setup_ci() error message, so users hitting it learn the full scope of what gapp setup provides. This is a smaller change but doesn't address the underlying "two commands, implicit ordering" usability cost. A new user still has to read the error, do the right thing, then come back. The unified command removes the need to discover the dependency at all.

Work breakdown

  • Add --ci flag to gapp setup CLI (gapp/admin/cli/main.py)
  • Refactor setup_ci() in gapp/admin/sdk/ci.py so the steps after project resolution become callable as a discrete _provision_ci_resources(project_id, ...) function
  • In GappSDK.setup(), after the existing foundation work, conditionally call _provision_ci_resources() when ci=True
  • Wire the CLI flag through to the SDK kwarg
  • Keep gapp ci setup as a deprecated alias for one major-version cycle — print a deprecation notice and forward to gapp setup --ci. Remove in the major after that.
  • Update gapp/admin/sdk/ci.py:614-617 error message — the "run gapp setup --project X first" guidance is no longer needed under the unified command. Decide whether to keep it for the deprecation-alias path.
  • Update README.md lifecycle examples
  • Update CONTRIBUTING.md if it documents the bootstrap order
  • Update the gapp:deploy skill (its SKILL.md describes the bootstrap flow and references gapp ci setup as a separate step)
  • Update any in-repo example workflows or scripts that reference gapp ci setup
  • Add tests covering the unified gapp setup --ci path against the existing _DEPLOY_SA_ROLES provisioning + WIF binding fixtures
  • Add a test that asserts the deprecated gapp ci setup alias still works (during the deprecation window) and forwards correctly
  • Add a CONTRIBUTING.md note (or equivalent) documenting the rationale for keeping foundation + CI as a single CLI surface, so this design decision is captured at the point of contributor onboarding rather than only in this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions