Skip to content

Leverage Deployment Stacks for idempotent operations#44

Merged
arnaudlh merged 20 commits into
mainfrom
copilot/leverage-deployment-stacks-idempotency
Jun 4, 2026
Merged

Leverage Deployment Stacks for idempotent operations#44
arnaudlh merged 20 commits into
mainfrom
copilot/leverage-deployment-stacks-idempotency

Conversation

Copilot AI commented May 5, 2026

Copy link
Copy Markdown
Contributor

The destroy workflow deletes a single resource group but leaves behind soft-deleted resources (Key Vault with purge protection), subscription-scoped resources, and multi-RG deployments — making destroy + redeploy non-idempotent.

Changes

Deploy workflow (git-ape-deploy.exampleyml)

  • Default deploy method changed from az deployment sub create to az stack sub create --action-on-unmanage deleteAll
  • New "Capture managed resources" step walks stack resources or deployment operations post-deploy
  • state.json now populated with stackId, deployMethod, managedResources[], resourceGroups[], subscriptions[], externalReferences[]
  • metadata.json gains deployMethod and resourceGroups[] on commit

Destroy workflow (git-ape-destroy.exampleyml)

  • Stack path: az stack sub delete --action-on-unmanage deleteAll when stackId present — single command covers all managed resources regardless of scope
  • Fallback path: preserved legacy sub-scoped resource cleanup → az group delete for pre-stack deployments
  • Soft-delete purge loop: iterates managedResources[].softDeletable, purges non-protected Key Vaults, marks purge-protected as retained-soft-deleted
  • Deployment history cleanup: az deployment sub delete to prevent 800/scope accumulation
  • New terminal statuses: partially-destroyed, retained-soft-deleted

State schema (website/docs/deployment/state.md)

  • Documented extended state.json schema with destroy strategy selection logic
  • Updated lifecycle diagram with new terminal states
  • Added deployMethod field to metadata.json spec

Example state.json (post-deploy)

{
  "stackId": "/subscriptions/.../providers/Microsoft.Resources/deploymentStacks/deploy-20260218-143022",
  "deployMethod": "stack",
  "managedResources": [
    {
      "id": "/subscriptions/.../Microsoft.KeyVault/vaults/kv-api-dev-eus",
      "type": "Microsoft.KeyVault/vaults",
      "scope": "resourceGroup",
      "softDeletable": true,
      "purgeProtected": true
    }
  ],
  "resourceGroups": ["rg-api-dev-eastus"],
  "subscriptions": ["00000000-..."],
  "externalReferences": []
}

This implements Phase 1 (schema + state capture) and Phase 2 (Deployment Stacks integration) from the issue. Phase 3 (extract destroy to standalone script) and Phase 4 (fixture validation) are deferred.

Copilot AI linked an issue May 5, 2026 that may be closed by this pull request
…tate schema

- Deploy workflow: use `az stack sub create` with `--action-on-unmanage deleteAll`
  as the default deployment method, with `az deployment sub create` as fallback
- Deploy workflow: add managed resources capture step after deploy that walks
  deployment operations or stack resources to populate state.managedResources[]
- Destroy workflow: use `az stack sub delete` when stackId is present in state,
  covering multi-RG, sub-scope, and MG-scope resources uniformly
- Destroy workflow: add soft-delete purge loop for Key Vault and Cognitive Services
- Destroy workflow: add deployment history cleanup step
- Destroy workflow: support new terminal statuses: `partially-destroyed` and
  `retained-soft-deleted`
- State schema: extend state.json with stackId, deployMethod, managedResources[],
  resourceGroups[], subscriptions[], externalReferences[]
- Metadata schema: add deployMethod and resourceGroups[] fields
- Documentation: update deployment state docs with new schema, statuses, and
  destroy strategy selection logic
- Regenerate workflow documentation pages

Agent-Logs-Url: https://github.com/Azure/git-ape/sessions/d2d1da54-9a38-41ef-9254-b5f585eab10e

Co-authored-by: arnaudlh <20535201+arnaudlh@users.noreply.github.com>
Copilot AI changed the title [WIP] Leverage deployment stacks for idempotency in destroy flow Leverage Deployment Stacks for idempotent destroy May 5, 2026
Copilot AI requested a review from arnaudlh May 5, 2026 02:27
- introduce azure-stack-deploy and azure-stack-destroy skills (bash + pwsh)
- destroy: fast async mode (default) polls resource groups, --wait for sync
- align workflows + agents + docs with new skills
- bump plugin to 0.1.0

🚀 - Generated by Copilot
@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

⚠️ Documentation Staleness Warning

Source files (agents, skills, workflows, or config) changed in this PR, but the generated documentation is out of date.

Changed docs that need regeneration:

  • website/docs/agents/azure-resource-deployer.md
  • website/docs/agents/azure-template-generator.md
  • website/docs/reference/marketplace.md
  • website/docs/reference/plugin-json.md
  • website/docs/skills/overview.md
  • website/docs/workflows/daily-repo-status-lock.md
  • website/docs/workflows/git-ape-actionlint.md
  • website/docs/workflows/git-ape-deploy.md
  • website/docs/workflows/git-ape-destroy.md
  • website/docs/workflows/git-ape-release.md
  • website/docs/workflows/issue-triage-agent-lock.md

To fix: Run the following command and commit the results:

node scripts/generate-docs.js

This is an advisory check — it does not block the PR.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Git-Ape’s deploy/destroy flows to use Azure Deployment Stacks as the primary lifecycle primitive, aiming to make destroy + redeploy idempotent across resource groups, subscription-scope resources, and soft-deletable services. It also introduces local “stack deploy/destroy” skills and extends the persisted deployment state schema to record stack identity and managed resources for deterministic teardown.

Changes:

  • Switch deploy from az deployment sub create to az stack sub create (with state capture of managed resources / RGs).
  • Switch destroy to prefer az stack sub delete --action-on-unmanage deleteAll, add soft-delete purge sweep + subscription deployment-history cleanup.
  • Add /azure-stack-deploy and /azure-stack-destroy skills (bash + PowerShell) and update docs/agent guidance + version bumps.
Show a summary per file
File Description
website/docs/workflows/git-ape-destroy.md Documents new stack-first destroy workflow, purge sweep, and new terminal statuses.
website/docs/workflows/git-ape-deploy.md Documents stack-first deploy workflow and managed-resource/state capture.
website/docs/skills/overview.md Adds “General Skills” entries for stack deploy/destroy.
website/docs/skills/azure-stack-destroy.md New docs page for stack-based destroy skill.
website/docs/skills/azure-stack-deploy.md New docs page for stack-based deploy skill.
website/docs/deployment/state.md Extends state/metadata schema docs and lifecycle diagram for stack + new destroy outcomes.
website/docs/agents/git-ape.md Updates agent guidance to prefer stacks and soft-delete purge on destroy.
website/docs/agents/azure-template-generator.md Updates generated-agent guidance to prefer stacks (fallback to legacy deployment).
website/docs/agents/azure-resource-deployer.md Updates deployer guidance to use stack validate/create and verify extended state.json.
plugin.json Bumps plugin version to 0.1.0.
.github/workflows/git-ape-destroy.exampleyml Implements stack delete path, purge sweep, deployment history cleanup, and new statuses.
.github/workflows/git-ape-deploy.exampleyml Implements stack validate/create and writes extended state.json + metadata updates.
.github/skills/azure-stack-destroy/SKILL.md Adds new user-invocable destroy skill spec and usage.
.github/skills/azure-stack-destroy/scripts/destroy-stack.sh Adds local bash destroy implementation (stack delete + purge + state updates).
.github/skills/azure-stack-destroy/scripts/destroy-stack.ps1 Adds local PowerShell destroy implementation (stack delete + purge + state updates).
.github/skills/azure-stack-deploy/SKILL.md Adds new user-invocable deploy skill spec and usage.
.github/skills/azure-stack-deploy/scripts/deploy-stack.sh Adds local bash deploy implementation (stack create + managed resource capture + state writes).
.github/skills/azure-stack-deploy/scripts/deploy-stack.ps1 Adds local PowerShell deploy implementation (stack create + managed resource capture + state writes).
.github/scripts/deployment-manager.sh Re-scopes manager script to inventory-only and points deploy/destroy to new skills.
.github/plugin/marketplace.json Bumps marketplace metadata version to 0.1.0.
.github/copilot-instructions.md Updates guidance to use stack deploy/destroy skills in local + CI flows.
.github/agents/git-ape.agent.md Mirrors website agent docs: stacks preferred + purge sweep guidance.
.github/agents/azure-template-generator.agent.md Mirrors website generator docs: stacks preferred + fallback guidance.
.github/agents/azure-resource-deployer.agent.md Mirrors website deployer docs: stack validate/create + extended state verification.

Copilot's findings

Comments suppressed due to low confidence (2)

.github/skills/azure-stack-deploy/scripts/deploy-stack.sh:233

  • RESOURCE_GROUPS is derived via jq capture("/resourceGroups/(?<rg>[^/]+)") over every managedResources[].id, which will error on subscription-scoped resource IDs (no /resourceGroups/). This can make the deploy skill fail while writing state.json even though the deployment itself succeeded; filter to RG-scoped IDs or use a non-throwing match (capture(...)?/try).

RESOURCE_GROUPS=$(echo "$MANAGED_RESOURCES" | jq -c '[.[].id | capture("/resourceGroups/(?<rg>[^/]+)") | .rg] | unique')
[[ "$(echo "$RESOURCE_GROUPS" | jq 'length')" == "0" && -n "$RG_NAME" ]] && RESOURCE_GROUPS="[\"$RG_NAME\"]"

.github/skills/azure-stack-destroy/scripts/destroy-stack.sh:298

  • grep -oE '(?<=locations/)[^/]+' uses a PCRE lookbehind, but -E (ERE) doesn’t support lookbehind. This will fail to extract the Cognitive Services location and silently skip purge. Use grep -oP or another non-lookbehind parsing approach.
            "Microsoft.CognitiveServices/accounts")
                if [[ "$PURGE_PROTECTED" != "true" ]]; then
                    LOC=$(echo "$RES_ID" | grep -oE '(?<=locations/)[^/]+' || echo "")
                    if [[ -n "$LOC" ]]; then
                        az cognitiveservices account purge --name "$RES_NAME" --location "$LOC" \
  • Files reviewed: 24/24 changed files
  • Comments generated: 10

Comment thread .github/workflows/git-ape-deploy.exampleyml
Comment thread .github/workflows/git-ape-deploy.exampleyml Outdated
Comment thread .github/workflows/git-ape-deploy.exampleyml
Comment thread .github/skills/azure-stack-deploy/scripts/deploy-stack.sh Outdated
Comment thread .github/skills/azure-stack-destroy/scripts/destroy-stack.sh
Comment thread .github/skills/azure-stack-destroy/scripts/destroy-stack.sh Outdated
Comment thread .github/skills/azure-stack-destroy/scripts/destroy-stack.ps1 Outdated
Comment thread website/docs/workflows/git-ape-deploy.md
Comment thread website/docs/deployment/state.md Outdated
Comment thread website/docs/deployment/state.md Outdated
- register both skills in evals manifest at expanded tier
- add 5-task eval for azure-stack-destroy (positive-local, positive-stack, negative-deploy, negative-non-gitape, negative-off-topic)
- add eval suite for azure-stack-deploy

🧪 - Generated by Copilot
- add stack-to-subscription fallback when az stack sub create fails
- query actual purgeProtected status for soft-deletable resources
- fix disown/wait bug that swallowed background process exit codes

🐛 - Generated by Copilot

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 35/35 changed files
  • Comments generated: 5

Comment thread .github/workflows/git-ape-deploy.exampleyml Outdated
Comment thread .github/workflows/git-ape-destroy.exampleyml Outdated
Comment thread .github/skills/azure-stack-destroy/scripts/destroy-stack.sh Outdated
Comment thread .github/skills/azure-stack-destroy/scripts/destroy-stack.ps1 Outdated
Comment thread .github/skills/azure-stack-deploy/scripts/deploy-stack.ps1
- resolve Cognitive Services purge location via list-deleted, not ID parsing
- add subscription-validate fallback when stack validate is unavailable
- classify nested/extension resources by last providers segment in deploy-stack.ps1
- regenerate workflow/skill docs to match fixed sources

🛠️ - Generated by Copilot

@sendtoshailesh sendtoshailesh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this — the overall direction makes sense, and I like that the PR moves deploy/destroy onto Deployment Stacks, captures richer state, and updates docs/evals alongside the workflow changes. I reviewed the workflow changes, stack deploy/destroy scripts, state handling, and the related documentation.

What looks good:

  • stack-first deploy path with explicit fallback to subscription deployments
  • destroy flow now addresses soft-deletable resources and subscription-scope cleanup
  • docs/evals were updated with the new stack-based model

Blocking concerns:

  1. .github/workflows/git-ape-destroy.exampleyml injects workflow_dispatch inputs directly into a shell run: block (inputs.confirm, inputs.deployment_id). Because the detect-destroys job runs before the azure-destroy environment gate and inherits workflow-level id-token: write / contents: write, a crafted manual-dispatch input can break out of the shell assignment and execute arbitrary commands in the runner context. These inputs need to be passed via env: (or otherwise safely quoted/validated) before this can ship.
  2. .github/workflows/git-ape-deploy.exampleyml interpolates matrix.deployment_id and steps.deploy.outputs.deploy_error directly into actions/github-script. Those values are derived from repo-controlled deployment directory names / command output, so a value containing quotes or template-literal syntax can break out of the JavaScript string. Please pass them through env/process.env or JSON-encode them first.

Non-blocking follow-up:

  • The local destroy scripts only emit destroyed / retained-soft-deleted / destroy-failed, while the workflow and docs also define partially-destroyed. That means local CLI/VS Code destroys do not currently preserve the same terminal state model as CI, which weakens the parity claim in this PR.

Overall verdict: strong feature direction, but I think the workflow injection issues are release blockers, so I’m requesting changes.

@arnaudlh

arnaudlh commented Jun 4, 2026

Copy link
Copy Markdown
Member

Thanks for the thorough review, @sendtoshailesh — all three points are valid. Addressed as follows:

1. Shell injection in git-ape-destroy.exampleyml (blocking) — fixed.
inputs.confirm / inputs.deployment_id are no longer interpolated into the run: block. They're now passed via env: (INPUT_CONFIRM, INPUT_DEPLOYMENT_ID) so the shell never expands them. I also added an allow-list charset validation (^[A-Za-z0-9._-]+$) on the deployment ID before it's used, and the JSON array is now built with jq -cn --arg instead of string-concatenation (which was also a JSON-injection vector). This all runs before the azure-destroy environment gate, so closing it here is the right place.

2. JS injection in git-ape-deploy.exampleyml github-script (blocking) — fixed.
matrix.deployment_id, steps.deploy.outputs.deploy_error, and the other repo-/command-derived values are no longer interpolated into the script body. They're passed via env: and read with process.env.*. runUrl now uses the context object instead of interpolating github.*. The earlier github-script step (comment-trigger check) only reads context.payload, so it was already safe.

3. Local destroy state parity (non-blocking) — acknowledged.
You're right that the local scripts only emit destroyed / retained-soft-deleted / destroy-failed while CI/docs also define partially-destroyed. I'll align the local scripts to emit partially-destroyed (stack deleted but a soft-delete purge failed) so the terminal state model matches CI, rather than narrowing the docs — keeping the parity claim true. Tracking this as a follow-up in the same PR.

Pushing the workflow fixes now.

arnaudlh added 2 commits June 4, 2026 09:57
- pass workflow_dispatch inputs via env and validate deployment_id charset
- build destroy ID JSON array with jq instead of string concat
- read github-script values from process.env instead of interpolation
- track delete attempts and emit partially-destroyed on incomplete delete
- aligns local CLI/VS Code destroy state model with CI workflow
@arnaudlh arnaudlh requested a review from sendtoshailesh June 4, 2026 01:58

@sendtoshailesh sendtoshailesh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review of the latest fixes:

Previously raised issues

  • ✅ Fixed: Shell injection risk in git-ape-destroy.exampleyml manual-dispatch handling. The workflow now passes dispatch inputs via env, validates deployment_id, and no longer interpolates raw inputs.* directly in shell.
  • ✅ Fixed: JavaScript/string injection risk in git-ape-deploy.exampleyml. The actions/github-script step now reads dynamic values from process.env instead of interpolating them into the script body.
  • ✅ Fixed: Local destroy scripts now emit partially-destroyed when a delete was attempted but did not complete, matching CI state semantics.

New issue found

  • .github/workflows/git-ape-destroy.exampleyml: in the fallback Delete subscription-scoped resources step, FAILED is incremented inside a while read loop fed by a pipe. In bash that loop runs in a subshell, so the parent shell never sees the incremented value. As written, the workflow will report FAILED=0 even when az resource delete calls fail, which can mask leftover subscription-scoped resources and lead to incorrect destroy status/reporting. Please switch to process substitution (while ...; do ...; done < <(...)) or another non-subshell iteration pattern.

Overall verdict
All previously raised blocking issues are resolved, but the new destroy-workflow correctness bug above is still significant enough that this PR is not merge-ready yet.

- replace piped while-read loop with process substitution so the counter survives
- pass sub_resources step output via env to harden against injection

🐛 - Generated by Copilot
@arnaudlh

arnaudlh commented Jun 4, 2026

Copy link
Copy Markdown
Member

Thanks for the review @sendtoshailesh — addressed the subshell counter bug you flagged.

Fix (commit 236e8406): In the Delete subscription-scoped resources (fallback) step of git-ape-destroy.exampleyml, the FAILED counter was incremented inside a piped ... | while read loop. Because the piped loop runs in a subshell, every increment was lost when the subshell exited, so the post-loop if [[ "$FAILED" -gt 0 ]] warning could never fire even when deletes failed.

Changes:

  • Replaced the pipe with process substitution (done < <(echo "$SUB_RESOURCES" | jq -r '.[].id')) so the counter survives in the parent shell.
  • Moved the steps.check.outputs.sub_resources value into an env: block instead of interpolating ${{ }} directly into the shell body, hardening against injection.
  • Added an explanatory comment so the pattern isn't reintroduced.

I also scanned the rest of the workflow and the integration-tester scripts — this was the only piped-loop counter of its kind.

@arnaudlh arnaudlh requested a review from sendtoshailesh June 4, 2026 10:56

@sendtoshailesh sendtoshailesh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed: The fallback subscription-resource delete loop in .github/workflows/git-ape-destroy.exampleyml now uses process substitution (done < <(...)) instead of a piped while read, so FAILED increments stay in the parent shell and destroy result reporting reflects actual deletion failures.

New issues found: None in the latest diff. I re-scanned the full change set and did not find any new material bugs, security issues, or logic regressions.

Overall verdict: Merge-ready from my side.

@arnaudlh arnaudlh merged commit 927b578 into main Jun 4, 2026
24 checks passed
@arnaudlh arnaudlh deleted the copilot/leverage-deployment-stacks-idempotency branch June 4, 2026 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Leverage Deployment Stacks for idempotency

4 participants