diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index 4d853d1..38aef62 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -22,5 +22,6 @@ "secrets", "autoscaling" ], - "mcpServers": "./.claude-mcp.json" + "mcpServers": "./.claude-mcp.json", + "hooks": "./hooks/hooks.json" } diff --git a/.codex-plugin/plugin.json b/.codex-plugin/plugin.json index d33ed12..b4a6dd4 100644 --- a/.codex-plugin/plugin.json +++ b/.codex-plugin/plugin.json @@ -23,6 +23,7 @@ "autoscaling" ], "skills": "./skills/", + "rules": "./rules/", "mcpServers": "./.mcp.json", "apps": "./.app.json", "interface": { diff --git a/.gitignore b/.gitignore index 1b3c42a..218ebcb 100644 --- a/.gitignore +++ b/.gitignore @@ -38,6 +38,9 @@ Thumbs.db .idea/ .vscode/ +# Claude Code local session data (hooks are distributed via .claude-plugin/plugin.json) +.claude/ + # Control Plane local artifacts *-bootstrap.json *.bak.yaml diff --git a/GEMINI.md b/GEMINI.md index 990ff7f..4a10549 100644 --- a/GEMINI.md +++ b/GEMINI.md @@ -18,6 +18,20 @@ The plugin auto-configures the Control Plane MCP Server. Your `CPLN_TOKEN` (prom **Never write a cpln command from memory.** Before constructing a command, consult `rules/cli-conventions.md` (command structure, shared flags, resource command map, hallucination traps) and `skills/cpln/SKILL.md` (setup, workflows, examples). Verify exact flag names with `cpln --help` or the MCP suggest tool (`mcp__cpln__cpln_suggest`). +## CLI Guardrails + +These commands do not exist — never generate them: + +- `cpln secret create` → use type-specific: `cpln secret create-opaque`, `create-aws`, `create-tls`, etc. +- `cpln apply` without `--file` → always: `cpln apply --file manifest.yaml` +- `cpln list` → use `cpln get` (no args = list all) + +These are too destructive to run without explicit user confirmation in the conversation: + +- `cpln gvc delete-all-workloads` — destroys every workload in the GVC +- `cpln volumeset shrink` — permanent data loss on the old volume +- Any `cpln delete` — surface the org, GVC, resource name, and blast radius before proceeding + ## Key Conventions - CLI commands use `cpln` prefix (e.g., `cpln apply --file manifest.yaml`) diff --git a/agents/k8s-migrator.md b/agents/k8s-migrator.md index f5786db..8da9151 100644 --- a/agents/k8s-migrator.md +++ b/agents/k8s-migrator.md @@ -146,6 +146,16 @@ This approach requires manual work to parameterize the converted output, but giv ## Docker Compose Migration (`cpln stack`) +> **Firewall default mismatch — read before writing native manifests.** +> `cpln stack` defaults external outbound to **open** for all services that expose ports. Native Control Plane workload manifests default external outbound to **blocked**. If you are writing CPLN manifests by hand (rather than using `cpln stack` directly), you must add explicit outbound rules for every external API, database, or service your workload calls — otherwise it silently cannot reach anything outside the platform. This is the most common failure mode for manual Docker Compose migrations. +> +> ```yaml +> firewallConfig: +> external: +> outboundAllowCIDR: +> - 0.0.0.0/0 # or restrict to specific CIDRs/hostnames +> ``` + ### Key Differences 1. **Service URLs must be rewritten**: `http://service-name:port` → `http://workload-name.GVC_NAME.cpln.local:port` diff --git a/agents/secret-setup-wizard.md b/agents/secret-setup-wizard.md index f2e45e9..089b10f 100644 --- a/agents/secret-setup-wizard.md +++ b/agents/secret-setup-wizard.md @@ -126,7 +126,7 @@ Use `cpln://secret/NAME` to reference the full secret, or `cpln://secret/NAME.KE | Secret Type | Available Keys | Example | |:---|:---|:---| -| opaque | `payload` (decoded if base64 runtime decode enabled), or omit key for raw JSON with `payload` + `encoding` | `cpln://secret/my-api-key.payload` | +| opaque | `payload` — **see encoding warning below** | `cpln://secret/my-api-key.payload` | | dictionary | user-defined keys | `cpln://secret/db-config.DB_HOST` | | userpass | `username`, `password` | `cpln://secret/creds.password` | | tls | `key`, `cert`, `chain` | `cpln://secret/my-tls.cert` | @@ -137,6 +137,8 @@ Use `cpln://secret/NAME` to reference the full secret, or `cpln://secret/NAME.KE | nats-account | `accountId`, `privateKey` | `cpln://secret/my-nats.accountId` | | any type | omit key for full secret as JSON | `cpln://secret/my-secret` | +**Opaque `.payload` encoding warning:** If the secret was created with base64 encoding (common when storing binary content — certs, keys, binary tokens — via the console or API), the `.payload` reference returns the base64-encoded string, not the decoded value. The workload receives it as a base64 string and typically fails with a cryptographic or parse error. To get the decoded value at runtime, the secret must have runtime decoding enabled (`encoding: base64` + runtime decode on the secret spec), or use the full secret reference (`cpln://secret/NAME`) and decode in application code. For plaintext secrets (API keys, connection strings, passwords), `.payload` works as expected. **Before injecting an opaque secret as `.payload`, ask the user: was this secret created with base64 encoding?** + **As volume mount:** Export the workload, add a volume, and apply: ```bash diff --git a/agents/workload-troubleshooter/diagnostics.md b/agents/workload-troubleshooter/diagnostics.md index bc6f569..f5227b6 100644 --- a/agents/workload-troubleshooter/diagnostics.md +++ b/agents/workload-troubleshooter/diagnostics.md @@ -73,6 +73,8 @@ cpln policy add-binding my-secret-policy --permission reveal --identity //gvc/MY **Or use `mcp__cpln__create_policy`** — creates the policy with bindings in one call. Params: `name` (required), `targetKind` (required), `targetLinks` (optional), `addPermissions` (optional array of permission strings), `addIdentities` (optional array of identity links), `org` (uses session context if set, required otherwise). +**If `cpln apply` fails on a policy manifest with a validation error and the YAML looks correct:** check that `targetKind` is a valid resource kind, all `principalLinks` use full resource paths (`//gvc/GVC/identity/NAME`), and `permissions` values are valid for the target kind. The API auto-sorts permissions alphabetically — ordering is not a cause of validation errors. + ## C. Port Mismatch **Symptoms**: Workload shows healthy but returns 502/503, or traffic doesn't reach the container. diff --git a/hooks/hooks.json b/hooks/hooks.json index c179962..6d97e4d 100644 --- a/hooks/hooks.json +++ b/hooks/hooks.json @@ -19,7 +19,44 @@ "command": "input=$(cat); cmd=$(echo \"$input\" | jq -r '.tool_input.command // empty'); if echo \"$cmd\" | grep -qE 'cpln\\s+apply' && ! echo \"$cmd\" | grep -qE '--file|--f\\b|-f\\b'; then echo 'BLOCK: cpln apply requires --file flag. Usage: cpln apply --file manifest.yaml' >&2; exit 1; fi" } ] + }, + { + "matcher": "Bash", + "hooks": [ + { + "type": "command", + "command": "input=$(cat); cmd=$(echo \"$input\" | jq -r '.tool_input.command // empty'); if echo \"$cmd\" | grep -qE 'cpln\\s+gvc\\s+delete-all-workloads'; then echo 'BLOCK: cpln gvc delete-all-workloads destroys every workload in the GVC. This command is too destructive to run from the AI layer. Confirm the org, GVC, and full blast radius in the conversation, then run this command manually in your terminal.' >&2; exit 1; fi" + } + ] + }, + { + "matcher": "Bash", + "hooks": [ + { + "type": "command", + "command": "input=$(cat); cmd=$(echo \"$input\" | jq -r '.tool_input.command // empty'); if echo \"$cmd\" | grep -qE 'cpln\\s+volumeset\\s+shrink'; then echo 'BLOCK: cpln volumeset shrink causes permanent data loss on the old volume. This command is too destructive to run from the AI layer. Confirm the org, GVC, volumeset name, and new size in the conversation, then run this command manually in your terminal.' >&2; exit 1; fi" + } + ] + }, + { + "matcher": "Bash", + "hooks": [ + { + "type": "command", + "command": "input=$(cat); cmd=$(echo \"$input\" | jq -r '.tool_input.command // empty'); if echo \"$cmd\" | grep -qE 'cpln\\s+\\w+\\s+list\\b'; then echo 'BLOCK: cpln list does not exist. Use cpln get (with no arguments to list all, or with a name to get one).' >&2; exit 1; fi" + } + ] + }, + { + "matcher": "Bash", + "hooks": [ + { + "type": "command", + "command": "input=$(cat); cmd=$(echo \"$input\" | jq -r '.tool_input.command // empty'); if echo \"$cmd\" | grep -qE 'cpln\\s+(workload|gvc|secret|identity|domain|policy|volumeset|serviceaccount|cloudaccount|agent|group|ipset|mk8s|image)\\s+delete\\b'; then echo 'WARNING: Destructive delete detected. Verify the correct org, GVC (if applicable), and resource name before proceeding. This action cannot be undone.' >&2; fi" + } + ] } ] } } + diff --git a/rules/cli-conventions.md b/rules/cli-conventions.md index 41f6968..e0e4b06 100644 --- a/rules/cli-conventions.md +++ b/rules/cli-conventions.md @@ -195,7 +195,7 @@ cpln logs '{gvc="GVC", workload="WORKLOAD"}' --org ORG --tail | Type | Command | Required Flags | |------|---------|---------------| -| Opaque | `create-opaque` | `--file` or `--payload` | +| Opaque | `create-opaque` | `--name`, `--file` (path or `-` for stdin). No `--payload` flag — write value to a file or pipe via stdin. Add `--encoding plain` for plaintext values (default encoding is base64). | | Dictionary | `create-dictionary` | `--entry KEY=VAL` (repeatable) | | Username/Password | `create-userpass` | `--username`, `--password` | | AWS | `create-aws` | `--access-key`, `--secret-key` | @@ -283,6 +283,7 @@ Flags: `--address`, `--location`, `--replica`. | `cpln workload update --identity X` | `cpln workload update REF --set spec.identityLink=//identity/X` | | `cpln secret update --data '{}'` | `cpln secret edit REF` or `cpln apply --file` | | `cpln gvc update --location LOC` | `cpln gvc update REF --set 'spec.staticPlacement.locationLinks+=//location/LOC'` | +| `spec.containers[N].resources.requests/limits` in manifests | Use flat `cpu` and `memory` directly on the container: `cpu: 50m` / `memory: 128Mi`. Kubernetes-style nested `resources` is not a valid field and will cause a 400 at apply time. | ## The Verification Rule diff --git a/rules/cpln-guardrails.md b/rules/cpln-guardrails.md index 12da727..a9dd31e 100644 --- a/rules/cpln-guardrails.md +++ b/rules/cpln-guardrails.md @@ -71,7 +71,7 @@ If any of those is unclear, ask. Propose what looks right and request confirmati > Before I run this, I want to confirm the target. Your active profile appears to be `` (org: ``, GVC: ``). Should I use that, or a different org / profile / GVC? -For **read-only** commands (`get`, `query`, `audit`, `logs`, `permissions`, `access-report`, `eventlog`), defaulting to the active profile is acceptable — but **announce the target before running**: *"Using profile `` → org ``, GVC ``…"* — so the user can correct course before output is produced. +For **read-only** commands (`get`, `query`, `audit`, `logs`, `permissions`, `access-report`, `eventlog`), defaulting to the active profile is acceptable — but **announce the exact target before running and pause one turn for correction**: *"Using profile `` → org ``, GVC ``. Reading now — let me know if that's the wrong environment."* Do not run the command in the same turn as the announcement. This one-turn pause is especially important in multi-GVC or multi-org environments where reading the wrong environment leads to debugging the wrong workload, which is a common and expensive mistake. **Why this rule exists.** Operating on the wrong org or GVC has caused production deletes, cross-environment secret leaks, and accidental cross-tenant changes. The cost of asking is one extra turn; the cost of acting on the wrong context is irreversible. @@ -106,6 +106,26 @@ Missing any one step = silent failure at runtime. This is the #1 support issue. - **Cron**: Deploys to ALL GVC locations, no overrides. Cannot expose ports. - **Workload type is immutable** after creation. Changing type requires delete + recreate. +### Resource Protection — Suggest Before Any Production Resource + +Before creating or modifying any resource a user identifies as production-critical, proactively suggest the `cpln/protected` tag. This is a platform-level safeguard that causes the API to reject any delete attempt — it works regardless of who (or what) tries to delete the resource, and does not require a conversation. + +```bash +# Protect a workload +cpln workload tag WORKLOAD --tag cpln/protected=true --gvc GVC --org ORG + +# Protect a GVC +cpln gvc tag GVC --tag cpln/protected=true --org ORG + +# Protect a volumeset +cpln volumeset tag VS --tag cpln/protected=true --gvc GVC --org ORG + +# Remove protection before a confirmed intentional delete +cpln workload tag WORKLOAD --remove-tag cpln/protected --gvc GVC --org ORG +``` + +When a delete is requested on a protected resource: (1) surface the protection, (2) confirm the user explicitly wants to remove it, (3) remove the tag, (4) proceed with the normal destructive-operation confirmation flow. + ### Destructive Operations — Always Confirm With Blast Radius Some operations cannot be undone, or have effects that reach beyond the resource being changed. **Before any destructive operation listed below, the AI MUST present a structured summary AND wait for explicit user confirmation — even when permissions are set to bypass / auto-approve.** Permission mode is about Claude Code's tool-prompt UX; this rule is conversation-level safety and is independent. Bypass permissions does NOT authorize destructive product operations. @@ -337,6 +357,7 @@ Before submitting work with Control Plane: - [ ] Service account keys in CI/CD (not user tokens) - [ ] No `docker.io/` prefix on external images - [ ] `cpln apply --ready` used for deployments +- [ ] For distroless or minimal Alpine images: confirm `sleep` binary is present, or set a custom preStop hook — if `sleep` is absent in any container, all containers receive SIGKILL immediately on shutdown, bypassing the grace period ## Resources diff --git a/rules/workload-manifest-reference.md b/rules/workload-manifest-reference.md index c0ec975..99002db 100644 --- a/rules/workload-manifest-reference.md +++ b/rules/workload-manifest-reference.md @@ -118,6 +118,7 @@ Probe types: exactly one of `exec`, `grpc`, `tcpSocket`, `httpGet` (xor constrai | Error | Fix | |:---|:---| +| `spec.containers[N].resources` present | Remove it — Control Plane does not use Kubernetes-style `resources.requests/limits`. Set `cpu` and `memory` directly on the container object: `cpu: 50m`, `memory: 128Mi`. This returns a 400 with `"resources" is not allowed`. | | Memory-to-CPU ratio exceeded | 1024Mi memory needs at least 128m CPU (ratio 8:1) | | GPU with Capacity AI | Disable Capacity AI when using GPU | | Concurrency on standard/stateful | Use rps, cpu, memory, latency, or keda instead | diff --git a/skills/access-control/SKILL.md b/skills/access-control/SKILL.md index c5c900e..bc245c1 100644 --- a/skills/access-control/SKILL.md +++ b/skills/access-control/SKILL.md @@ -113,7 +113,7 @@ bindings: ``` **Constraints:** -- Each binding's permissions must be **sorted alphabetically and unique** (validation rule). +- Each binding's permissions must be **unique**. The API auto-sorts them alphabetically — you don't need to sort manually. - A policy can have up to **50 bindings**, each with up to **200 principal links**. - The same principal can appear in multiple bindings (different permission sets). @@ -278,6 +278,7 @@ bindings: ## Gotchas - **Policies fail silently when wrong.** A typo in `targetKind`, a missing principal link, or an invalid permission name produces a policy that exists but grants nothing. Always verify with `cpln policy access-report POLICY_NAME` after creation. +- **Permission ordering doesn't matter — the API auto-sorts.** You do not need to sort permissions alphabetically in your manifests; the platform sorts them on write. Duplicate permissions in the same binding will cause a validation error. - **Built-in policies cannot be modified or deleted.** Origins `builtin` are read-only; create your own with `default` origin. - **`reveal` (not `read`) is the permission for accessing secret values.** This is the most common permission-name mistake. - **Identity links are GVC-scoped.** Use `//gvc/GVC/identity/NAME`, not `//identity/NAME`. diff --git a/skills/autoscaling-capacity/SKILL.md b/skills/autoscaling-capacity/SKILL.md index 7cf078c..0c51a95 100644 --- a/skills/autoscaling-capacity/SKILL.md +++ b/skills/autoscaling-capacity/SKILL.md @@ -170,7 +170,19 @@ spec: ### Event-Driven KEDA (Standard, Redis Queue) +**Prerequisite:** KEDA must be enabled on the GVC before any workload can use `metric: keda`. Applying a workload with `metric: keda` to a GVC without KEDA enabled will silently not scale — no error event, the workload just ignores queue depth. + +```yaml +# Step 1: Enable KEDA on the GVC (one-time setup) +kind: gvc +name: my-gvc +spec: + keda: + enabled: true +``` + ```yaml +# Step 2: Configure the workload kind: workload name: queue-processor spec: diff --git a/skills/cpln/SKILL.md b/skills/cpln/SKILL.md index 14446ca..7863b32 100644 --- a/skills/cpln/SKILL.md +++ b/skills/cpln/SKILL.md @@ -105,6 +105,13 @@ cpln policy add-binding secret-access \ # Inject the secret into the workload cpln workload update my-app --gvc my-gvc \ --set spec.containers.main.env.DB_PASSWORD.value=cpln://secret/db-password.payload + +# ALWAYS verify the injection landed — --set exits 0 even if the container name +# doesn't match, silently writing to a path that doesn't exist in the spec. +cpln workload get my-app --gvc my-gvc -o json \ + | jq '.spec.containers[] | select(.name == "main") | .env' +# If DB_PASSWORD is absent from the output, the container name was wrong. +# Re-run with the correct name from: cpln workload get my-app --gvc my-gvc -o json | jq '[.spec.containers[].name]' ``` ## Workflow: GitOps with cpln apply diff --git a/skills/workload-security/SKILL.md b/skills/workload-security/SKILL.md index f383c98..858f15a 100644 --- a/skills/workload-security/SKILL.md +++ b/skills/workload-security/SKILL.md @@ -319,7 +319,7 @@ Full `spec.rolloutOptions` configuration: ### Critical Warnings -- If `sleep` is not available in **any** container, ALL containers receive SIGKILL immediately +- If `sleep` is not available in **any** container, ALL containers receive SIGKILL immediately — the entire grace period is skipped. This silently affects distroless images, scratch-based images, and some minimal Alpine builds. Verify with `cpln workload exec WORKLOAD --gvc GVC -- which sleep` before relying on the grace period. If `sleep` is absent, either add it to the image or configure an explicit preStop hook that does not depend on it. - If a custom preStop hook throws an error in **any** container, ALL containers receive SIGKILL immediately ### Custom PreStop Hook