Leverage Deployment Stacks for idempotent operations#44
Conversation
…tate schema - Deploy workflow: use `az stack sub create` with `--action-on-unmanage deleteAll` as the default deployment method, with `az deployment sub create` as fallback - Deploy workflow: add managed resources capture step after deploy that walks deployment operations or stack resources to populate state.managedResources[] - Destroy workflow: use `az stack sub delete` when stackId is present in state, covering multi-RG, sub-scope, and MG-scope resources uniformly - Destroy workflow: add soft-delete purge loop for Key Vault and Cognitive Services - Destroy workflow: add deployment history cleanup step - Destroy workflow: support new terminal statuses: `partially-destroyed` and `retained-soft-deleted` - State schema: extend state.json with stackId, deployMethod, managedResources[], resourceGroups[], subscriptions[], externalReferences[] - Metadata schema: add deployMethod and resourceGroups[] fields - Documentation: update deployment state docs with new schema, statuses, and destroy strategy selection logic - Regenerate workflow documentation pages Agent-Logs-Url: https://github.com/Azure/git-ape/sessions/d2d1da54-9a38-41ef-9254-b5f585eab10e Co-authored-by: arnaudlh <20535201+arnaudlh@users.noreply.github.com>
- introduce azure-stack-deploy and azure-stack-destroy skills (bash + pwsh) - destroy: fast async mode (default) polls resource groups, --wait for sync - align workflows + agents + docs with new skills - bump plugin to 0.1.0 🚀 - Generated by Copilot
|
🔧 - Generated by Copilot
There was a problem hiding this comment.
Pull request overview
Updates Git-Ape’s deploy/destroy flows to use Azure Deployment Stacks as the primary lifecycle primitive, aiming to make destroy + redeploy idempotent across resource groups, subscription-scope resources, and soft-deletable services. It also introduces local “stack deploy/destroy” skills and extends the persisted deployment state schema to record stack identity and managed resources for deterministic teardown.
Changes:
- Switch deploy from
az deployment sub createtoaz stack sub create(with state capture of managed resources / RGs). - Switch destroy to prefer
az stack sub delete --action-on-unmanage deleteAll, add soft-delete purge sweep + subscription deployment-history cleanup. - Add
/azure-stack-deployand/azure-stack-destroyskills (bash + PowerShell) and update docs/agent guidance + version bumps.
Show a summary per file
| File | Description |
|---|---|
website/docs/workflows/git-ape-destroy.md |
Documents new stack-first destroy workflow, purge sweep, and new terminal statuses. |
website/docs/workflows/git-ape-deploy.md |
Documents stack-first deploy workflow and managed-resource/state capture. |
website/docs/skills/overview.md |
Adds “General Skills” entries for stack deploy/destroy. |
website/docs/skills/azure-stack-destroy.md |
New docs page for stack-based destroy skill. |
website/docs/skills/azure-stack-deploy.md |
New docs page for stack-based deploy skill. |
website/docs/deployment/state.md |
Extends state/metadata schema docs and lifecycle diagram for stack + new destroy outcomes. |
website/docs/agents/git-ape.md |
Updates agent guidance to prefer stacks and soft-delete purge on destroy. |
website/docs/agents/azure-template-generator.md |
Updates generated-agent guidance to prefer stacks (fallback to legacy deployment). |
website/docs/agents/azure-resource-deployer.md |
Updates deployer guidance to use stack validate/create and verify extended state.json. |
plugin.json |
Bumps plugin version to 0.1.0. |
.github/workflows/git-ape-destroy.exampleyml |
Implements stack delete path, purge sweep, deployment history cleanup, and new statuses. |
.github/workflows/git-ape-deploy.exampleyml |
Implements stack validate/create and writes extended state.json + metadata updates. |
.github/skills/azure-stack-destroy/SKILL.md |
Adds new user-invocable destroy skill spec and usage. |
.github/skills/azure-stack-destroy/scripts/destroy-stack.sh |
Adds local bash destroy implementation (stack delete + purge + state updates). |
.github/skills/azure-stack-destroy/scripts/destroy-stack.ps1 |
Adds local PowerShell destroy implementation (stack delete + purge + state updates). |
.github/skills/azure-stack-deploy/SKILL.md |
Adds new user-invocable deploy skill spec and usage. |
.github/skills/azure-stack-deploy/scripts/deploy-stack.sh |
Adds local bash deploy implementation (stack create + managed resource capture + state writes). |
.github/skills/azure-stack-deploy/scripts/deploy-stack.ps1 |
Adds local PowerShell deploy implementation (stack create + managed resource capture + state writes). |
.github/scripts/deployment-manager.sh |
Re-scopes manager script to inventory-only and points deploy/destroy to new skills. |
.github/plugin/marketplace.json |
Bumps marketplace metadata version to 0.1.0. |
.github/copilot-instructions.md |
Updates guidance to use stack deploy/destroy skills in local + CI flows. |
.github/agents/git-ape.agent.md |
Mirrors website agent docs: stacks preferred + purge sweep guidance. |
.github/agents/azure-template-generator.agent.md |
Mirrors website generator docs: stacks preferred + fallback guidance. |
.github/agents/azure-resource-deployer.agent.md |
Mirrors website deployer docs: stack validate/create + extended state verification. |
Copilot's findings
Comments suppressed due to low confidence (2)
.github/skills/azure-stack-deploy/scripts/deploy-stack.sh:233
RESOURCE_GROUPSis derived viajq capture("/resourceGroups/(?<rg>[^/]+)")over everymanagedResources[].id, which will error on subscription-scoped resource IDs (no/resourceGroups/). This can make the deploy skill fail while writingstate.jsoneven though the deployment itself succeeded; filter to RG-scoped IDs or use a non-throwing match (capture(...)?/try).
RESOURCE_GROUPS=$(echo "$MANAGED_RESOURCES" | jq -c '[.[].id | capture("/resourceGroups/(?<rg>[^/]+)") | .rg] | unique')
[[ "$(echo "$RESOURCE_GROUPS" | jq 'length')" == "0" && -n "$RG_NAME" ]] && RESOURCE_GROUPS="[\"$RG_NAME\"]"
.github/skills/azure-stack-destroy/scripts/destroy-stack.sh:298
grep -oE '(?<=locations/)[^/]+'uses a PCRE lookbehind, but-E(ERE) doesn’t support lookbehind. This will fail to extract the Cognitive Services location and silently skip purge. Usegrep -oPor another non-lookbehind parsing approach.
"Microsoft.CognitiveServices/accounts")
if [[ "$PURGE_PROTECTED" != "true" ]]; then
LOC=$(echo "$RES_ID" | grep -oE '(?<=locations/)[^/]+' || echo "")
if [[ -n "$LOC" ]]; then
az cognitiveservices account purge --name "$RES_NAME" --location "$LOC" \
- Files reviewed: 24/24 changed files
- Comments generated: 10
- register both skills in evals manifest at expanded tier - add 5-task eval for azure-stack-destroy (positive-local, positive-stack, negative-deploy, negative-non-gitape, negative-off-topic) - add eval suite for azure-stack-deploy 🧪 - Generated by Copilot
- add stack-to-subscription fallback when az stack sub create fails - query actual purgeProtected status for soft-deletable resources - fix disown/wait bug that swallowed background process exit codes 🐛 - Generated by Copilot
- resolve Cognitive Services purge location via list-deleted, not ID parsing - add subscription-validate fallback when stack validate is unavailable - classify nested/extension resources by last providers segment in deploy-stack.ps1 - regenerate workflow/skill docs to match fixed sources 🛠️ - Generated by Copilot
sendtoshailesh
left a comment
There was a problem hiding this comment.
Thanks for pushing this — the overall direction makes sense, and I like that the PR moves deploy/destroy onto Deployment Stacks, captures richer state, and updates docs/evals alongside the workflow changes. I reviewed the workflow changes, stack deploy/destroy scripts, state handling, and the related documentation.
What looks good:
- stack-first deploy path with explicit fallback to subscription deployments
- destroy flow now addresses soft-deletable resources and subscription-scope cleanup
- docs/evals were updated with the new stack-based model
Blocking concerns:
.github/workflows/git-ape-destroy.exampleymlinjectsworkflow_dispatchinputs directly into a shellrun:block (inputs.confirm,inputs.deployment_id). Because thedetect-destroysjob runs before theazure-destroyenvironment gate and inherits workflow-levelid-token: write/contents: write, a crafted manual-dispatch input can break out of the shell assignment and execute arbitrary commands in the runner context. These inputs need to be passed viaenv:(or otherwise safely quoted/validated) before this can ship..github/workflows/git-ape-deploy.exampleymlinterpolatesmatrix.deployment_idandsteps.deploy.outputs.deploy_errordirectly intoactions/github-script. Those values are derived from repo-controlled deployment directory names / command output, so a value containing quotes or template-literal syntax can break out of the JavaScript string. Please pass them throughenv/process.envor JSON-encode them first.
Non-blocking follow-up:
- The local destroy scripts only emit
destroyed/retained-soft-deleted/destroy-failed, while the workflow and docs also definepartially-destroyed. That means local CLI/VS Code destroys do not currently preserve the same terminal state model as CI, which weakens the parity claim in this PR.
Overall verdict: strong feature direction, but I think the workflow injection issues are release blockers, so I’m requesting changes.
|
Thanks for the thorough review, @sendtoshailesh — all three points are valid. Addressed as follows: 1. Shell injection in 2. JS injection in 3. Local destroy state parity (non-blocking) — acknowledged. Pushing the workflow fixes now. |
- pass workflow_dispatch inputs via env and validate deployment_id charset - build destroy ID JSON array with jq instead of string concat - read github-script values from process.env instead of interpolation
- track delete attempts and emit partially-destroyed on incomplete delete - aligns local CLI/VS Code destroy state model with CI workflow
sendtoshailesh
left a comment
There was a problem hiding this comment.
Follow-up review of the latest fixes:
Previously raised issues
- ✅ Fixed: Shell injection risk in
git-ape-destroy.exampleymlmanual-dispatch handling. The workflow now passes dispatch inputs viaenv, validatesdeployment_id, and no longer interpolates rawinputs.*directly in shell. - ✅ Fixed: JavaScript/string injection risk in
git-ape-deploy.exampleyml. Theactions/github-scriptstep now reads dynamic values fromprocess.envinstead of interpolating them into the script body. - ✅ Fixed: Local destroy scripts now emit
partially-destroyedwhen a delete was attempted but did not complete, matching CI state semantics.
New issue found
- ❌
.github/workflows/git-ape-destroy.exampleyml: in the fallbackDelete subscription-scoped resourcesstep,FAILEDis incremented inside awhile readloop fed by a pipe. In bash that loop runs in a subshell, so the parent shell never sees the incremented value. As written, the workflow will reportFAILED=0even whenaz resource deletecalls fail, which can mask leftover subscription-scoped resources and lead to incorrect destroy status/reporting. Please switch to process substitution (while ...; do ...; done < <(...)) or another non-subshell iteration pattern.
Overall verdict
All previously raised blocking issues are resolved, but the new destroy-workflow correctness bug above is still significant enough that this PR is not merge-ready yet.
- replace piped while-read loop with process substitution so the counter survives - pass sub_resources step output via env to harden against injection 🐛 - Generated by Copilot
|
Thanks for the review @sendtoshailesh — addressed the subshell counter bug you flagged. Fix (commit Changes:
I also scanned the rest of the workflow and the integration-tester scripts — this was the only piped-loop counter of its kind. |
sendtoshailesh
left a comment
There was a problem hiding this comment.
✅ Fixed: The fallback subscription-resource delete loop in .github/workflows/git-ape-destroy.exampleyml now uses process substitution (done < <(...)) instead of a piped while read, so FAILED increments stay in the parent shell and destroy result reporting reflects actual deletion failures.
New issues found: None in the latest diff. I re-scanned the full change set and did not find any new material bugs, security issues, or logic regressions.
Overall verdict: Merge-ready from my side.
The destroy workflow deletes a single resource group but leaves behind soft-deleted resources (Key Vault with purge protection), subscription-scoped resources, and multi-RG deployments — making destroy + redeploy non-idempotent.
Changes
Deploy workflow (
git-ape-deploy.exampleyml)az deployment sub createtoaz stack sub create --action-on-unmanage deleteAllstate.jsonnow populated withstackId,deployMethod,managedResources[],resourceGroups[],subscriptions[],externalReferences[]metadata.jsongainsdeployMethodandresourceGroups[]on commitDestroy workflow (
git-ape-destroy.exampleyml)az stack sub delete --action-on-unmanage deleteAllwhenstackIdpresent — single command covers all managed resources regardless of scopeaz group deletefor pre-stack deploymentsmanagedResources[].softDeletable, purges non-protected Key Vaults, marks purge-protected asretained-soft-deletedaz deployment sub deleteto prevent 800/scope accumulationpartially-destroyed,retained-soft-deletedState schema (
website/docs/deployment/state.md)state.jsonschema with destroy strategy selection logicdeployMethodfield tometadata.jsonspecExample state.json (post-deploy)
{ "stackId": "/subscriptions/.../providers/Microsoft.Resources/deploymentStacks/deploy-20260218-143022", "deployMethod": "stack", "managedResources": [ { "id": "/subscriptions/.../Microsoft.KeyVault/vaults/kv-api-dev-eus", "type": "Microsoft.KeyVault/vaults", "scope": "resourceGroup", "softDeletable": true, "purgeProtected": true } ], "resourceGroups": ["rg-api-dev-eastus"], "subscriptions": ["00000000-..."], "externalReferences": [] }This implements Phase 1 (schema + state capture) and Phase 2 (Deployment Stacks integration) from the issue. Phase 3 (extract destroy to standalone script) and Phase 4 (fixture validation) are deferred.