diff --git a/.github/agents/azure-resource-deployer.agent.md b/.github/agents/azure-resource-deployer.agent.md index dadde28..9ac62da 100644 --- a/.github/agents/azure-resource-deployer.agent.md +++ b/.github/agents/azure-resource-deployer.agent.md @@ -108,21 +108,33 @@ Use mcp_azure_mcp_search with "deploy" intent to execute template deployment **Option B: Azure CLI (Fallback)** -**Always use subscription-level deployment** — the ARM template includes resource group creation, so we deploy at subscription scope: +**Always deploy as an Azure Deployment Stack at subscription scope** — the ARM template includes resource group creation, and Stacks give us idempotent multi-scope lifecycle management with a single destroy call: ```bash -# Subscription-level deployment (creates RG + all resources atomically) -az deployment sub create \ +# Subscription-scope Deployment Stack (creates RG + all resources atomically, +# tracked as a single lifecycle unit). +az stack sub create \ --name "{deployment-id}" \ --location {location} \ --template-file {template.json} \ --parameters @{parameters.json} \ + --action-on-unmanage deleteAll \ + --deny-settings-mode none \ + --description "Git-Ape deployment {deployment-id}" \ + --tags "managedBy=git-ape" "deploymentId={deployment-id}" \ + --yes \ --output json ``` -**DO NOT use `az deployment group create`** — our templates always include the resource group as a resource. Subscription-level deployment handles everything in one command. +**Why Stacks (and not `az deployment sub create`):** +- The stack is the single unit of lifecycle — one create, one update, one destroy. +- `--action-on-unmanage deleteAll` guarantees destroy removes every managed resource across every scope (subscription, multiple RGs, sub-scope role/policy assignments) in one synchronous call. +- No orphans, idempotent re-runs, no soft-deleted surprises hiding in the subscription after an RG-delete. +- See [Azure/git-ape#30](https://github.com/Azure/git-ape/issues/30) for the rationale. -Capture the deployment operation ID for tracking. +**DO NOT use `az deployment group create` or `az deployment sub create`** — always go through the stack. + +Capture the `stackId` from the response — it becomes the single source of truth stored in `state.json` for the destroy workflow. ### 3. Monitor Progress @@ -330,16 +342,16 @@ if [[ "$USER_CHOICE" == "A" ]]; then read CONFIRMATION if [[ "$CONFIRMATION" == "confirm rollback" ]]; then - # Delete resources - az resource delete --ids {resource-id-1} {resource-id-2} - - # If RG was created new, delete it - if [[ "$RG_NEW" == "true" ]]; then - az group delete --name {rg-name} --yes --no-wait - fi - + # Delete the deployment stack — this removes every managed resource + # across all scopes (RGs, sub-scope role assignments, etc.) in one call. + az stack sub delete \ + --name "{deployment-id}" \ + --action-on-unmanage deleteAll \ + --bypass-stack-out-of-sync-error true \ + --yes + # Log rollback - echo "Rollback completed" >> .azure/deployments/{deployment-id}/deployment.log + echo "Rollback completed (stack deleted)" >> .azure/deployments/{deployment-id}/deployment.log fi fi ``` diff --git a/.github/agents/azure-template-generator.agent.md b/.github/agents/azure-template-generator.agent.md index 79e8b4d..6121c5e 100644 --- a/.github/agents/azure-template-generator.agent.md +++ b/.github/agents/azure-template-generator.agent.md @@ -135,7 +135,7 @@ see [git-ape.agent.md](git-ape.agent.md). - Resource Group is a `Microsoft.Resources/resourceGroups` resource inside the template - Other resources go inside a nested `Microsoft.Resources/deployments` with `"resourceGroup"` property - Use `subscriptionResourceId()` for RG references, regular `resourceId()` inside nested -- Deploy with `az deployment sub create` (not `az deployment group create`) +- Deploy with `az stack sub create --action-on-unmanage deleteAll` (not `az deployment group create` or `az deployment sub create`) - `uniqueString()` uses `subscription().subscriptionId` instead of `resourceGroup().id` **Nested Template Requirements:** @@ -660,25 +660,32 @@ After showing the preview, provide the complete ARM template: ## Deployment Commands -**Azure CLI (Subscription-level deployment):** +**Always deploy as an Azure Deployment Stack at subscription scope.** Stacks track every resource the template creates (across every scope) as a single lifecycle unit, so destroy is one idempotent call. See [Azure/git-ape#30](https://github.com/Azure/git-ape/issues/30). + +**Azure CLI (Deployment Stack at subscription scope):** ```bash -az deployment sub create \ +az stack sub create \ --name {deployment-id} \ --location {location} \ --template-file template.json \ - --parameters @parameters.json + --parameters @parameters.json \ + --action-on-unmanage deleteAll \ + --deny-settings-mode none \ + --yes ``` **PowerShell:** ```powershell -New-AzSubscriptionDeployment ` +New-AzSubscriptionDeploymentStack ` -Name {deployment-id} ` -Location {location} ` -TemplateFile template.json ` - -TemplateParameterFile parameters.json + -TemplateParameterFile parameters.json ` + -ActionOnUnmanage deleteAll ` + -DenySettingsMode none ``` -**Note:** We use subscription-level deployments so the resource group is created as part of the template. No need to create the RG separately. +**Note:** We deploy as a **subscription-scope Deployment Stack** so the resource group is created as part of the template and the whole deployment becomes a single lifecycle unit. Destroy with `az stack sub delete --action-on-unmanage deleteAll`. ```` ## Constraints diff --git a/.github/agents/git-ape.agent.md b/.github/agents/git-ape.agent.md index d206482..0916e4e 100644 --- a/.github/agents/git-ape.agent.md +++ b/.github/agents/git-ape.agent.md @@ -354,12 +354,12 @@ The deployment plan MUST start with a clear "Target Environment" table: **Delegate to:** `azure-resource-deployer` The deployer will: -- Execute the ARM template as a **subscription-level deployment** (`az deployment sub create`) -- The ARM template includes resource group creation — everything deploys atomically +- Execute the ARM template as a **subscription-scope Azure Deployment Stack** (`az stack sub create --action-on-unmanage deleteAll`) +- The ARM template includes resource group creation — everything deploys atomically, tracked as a single lifecycle unit - Monitor deployment progress in real-time - Handle any deployment failures - Verify resource creation via Azure Resource Graph -- Capture deployment outputs (resource IDs, endpoints, etc.) +- Capture deployment outputs (resource IDs, endpoints, etc.) and the `stackId` for the destroy workflow **Deployment Monitoring:** Always poll deployment state every **30 seconds** using `sleep 30` between checks. No exponential backoff — use a fixed 30-second interval for all resources regardless of type or expected duration. Check both the top-level deployment and nested deployment statuses on every poll. diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 2c29d37..be805ff 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -171,7 +171,7 @@ Git-Ape provides three GitHub Actions workflows under `.github/workflows/`: **What it does:** 1. Detects which deployment directories changed in the PR 2. Logs into Azure via OIDC -3. Validates each ARM template (`az deployment sub validate`) +3. Validates each ARM template (`az stack sub validate`) 4. Runs what-if analysis (`az deployment sub what-if`) 5. Reads the architecture diagram from the deployment directory 6. Posts a detailed plan as a **PR comment** (validation result + what-if + architecture) @@ -192,12 +192,17 @@ Git-Ape provides three GitHub Actions workflows under `.github/workflows/`: **What it does:** 1. Detects deployment directories to execute 2. Logs into Azure via OIDC -3. Validates the template one more time -4. Runs `az deployment sub create` to deploy +3. Validates the template one more time (`az stack sub validate`) +4. Deploys as an **Azure Deployment Stack** (`az stack sub create --action-on-unmanage deleteAll`) 5. Runs integration tests (lists deployed resources, tests HTTP endpoints) -6. Commits `state.json` with deployment result back to the repo +6. Commits `state.json` (including `stackId` and `managedResources[]`) back to the repo 7. Posts deployment result as a PR comment (on `/deploy` trigger) +**Why Deployment Stacks:** +- The stack is the single unit of lifecycle — create, update, and destroy operate on it, not on the underlying RGs. +- `deleteAll` on unmanage guarantees destruction cleans up every managed resource across every scope (subscription, multiple RGs, role/policy assignments at sub scope) in one call. No orphans, idempotent re-runs. +- See [Azure/git-ape#30](https://github.com/Azure/git-ape/issues/30) for the rationale. + **Requires:** GitHub environment `azure-deploy` (for environment protection rules) **Safety:** @@ -213,11 +218,15 @@ Git-Ape provides three GitHub Actions workflows under `.github/workflows/`: **What it does:** 1. Detects deployments where `metadata.json` status changed to `destroy-requested` -2. Reads `state.json` to find the resource group name -3. Inventories all resources in the resource group -4. Deletes the resource group (`az group delete` — synchronous, waits for completion) +2. Reads `state.json` to find the deployment stack name (`deploymentId`) and `stackId` +3. Calls `az stack sub show` to inventory the stack's managed resources +4. Calls `az stack sub delete --action-on-unmanage deleteAll` — removes every resource the stack manages, across all scopes, in one synchronous call 5. Updates `state.json` and `metadata.json` with `destroyed` status and commits to repo +**Idempotency:** +- If the stack is already gone, the workflow records `already-destroyed` and succeeds cleanly. +- No RG-delete fallback path, no subscription-scope resource sweep — Stacks handle multi-scope destruction natively. + **Destroy flow:** 1. Agent or user creates a PR that sets `metadata.json` status to `destroy-requested` 2. PR is reviewed and approved (human gate for destructive action) @@ -413,10 +422,14 @@ jobs: subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - name: Deploy run: | - az deployment sub create \ + az stack sub create \ + --name ${{ env.DEPLOYMENT_ID }} \ --location ${{ env.LOCATION }} \ --template-file .azure/deployments/${{ env.DEPLOYMENT_ID }}/template.json \ - --parameters @.azure/deployments/${{ env.DEPLOYMENT_ID }}/parameters.json + --parameters @.azure/deployments/${{ env.DEPLOYMENT_ID }}/parameters.json \ + --action-on-unmanage deleteAll \ + --deny-settings-mode none \ + --yes ``` **Transitioning from Service Principal secrets to OIDC:** diff --git a/.github/workflows/git-ape-deploy.exampleyml b/.github/workflows/git-ape-deploy.exampleyml index 48c6d71..a3355d6 100644 --- a/.github/workflows/git-ape-deploy.exampleyml +++ b/.github/workflows/git-ape-deploy.exampleyml @@ -195,12 +195,15 @@ jobs: tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - - name: Validate before deploy + - name: Validate before deploy (stack) run: | - az deployment sub validate \ + az stack sub validate \ + --name "${{ matrix.deployment_id }}" \ --location "${{ steps.params.outputs.location }}" \ --template-file "${{ steps.params.outputs.deploy_dir }}/template.json" \ --parameters @"${{ steps.params.outputs.deploy_dir }}/parameters.json" \ + --action-on-unmanage deleteAll \ + --deny-settings-mode none \ --output json - name: Run Microsoft Defender for DevOps template analyzer @@ -234,17 +237,26 @@ jobs: echo "Security scan passed — no errors found" fi - - name: Deploy to Azure + - name: Deploy to Azure (Deployment Stack) id: deploy run: | - echo "🚀 Starting deployment: ${{ matrix.deployment_id }}" + STACK_NAME="${{ matrix.deployment_id }}" + echo "🚀 Starting stack deployment: $STACK_NAME" START_TIME=$(date +%s) - DEPLOY_OUTPUT=$(az deployment sub create \ - --name "${{ matrix.deployment_id }}" \ + # Create/update the subscription-scope Deployment Stack. + # --action-on-unmanage deleteAll binds the whole stack (RG + contents) + # to a single lifecycle so destroy is idempotent across all scopes. + DEPLOY_OUTPUT=$(az stack sub create \ + --name "$STACK_NAME" \ --location "${{ steps.params.outputs.location }}" \ --template-file "${{ steps.params.outputs.deploy_dir }}/template.json" \ --parameters @"${{ steps.params.outputs.deploy_dir }}/parameters.json" \ + --action-on-unmanage deleteAll \ + --deny-settings-mode none \ + --description "Git-Ape deployment $STACK_NAME" \ + --tags "managedBy=git-ape" "deploymentId=$STACK_NAME" \ + --yes \ --output json 2>&1) EXIT_CODE=$? @@ -260,27 +272,40 @@ jobs: echo "EOF" >> "$GITHUB_OUTPUT" echo "" echo "==========================================" - echo "❌ DEPLOYMENT FAILED" + echo "❌ STACK DEPLOYMENT FAILED" echo "==========================================" echo "$DEPLOY_OUTPUT" echo "==========================================" - echo "::error::Deployment failed — see output above for details" + echo "::error::Stack deployment failed — see output above for details" exit 1 fi echo "deploy_status=succeeded" >> "$GITHUB_OUTPUT" - # Extract outputs - OUTPUTS=$(echo "$DEPLOY_OUTPUT" | jq -r '.properties.outputs // {}') + # Capture the stack resource id — this is the single source of truth + # for destroy. Stored in state.json as `stackId`. + STACK_ID=$(echo "$DEPLOY_OUTPUT" | jq -r '.id // empty') + echo "stack_id=$STACK_ID" >> "$GITHUB_OUTPUT" + + # Extract template outputs from the stack + OUTPUTS=$(echo "$DEPLOY_OUTPUT" | jq -r '.outputs // .properties.outputs // {}') echo "deploy_outputs<> "$GITHUB_OUTPUT" echo "$OUTPUTS" >> "$GITHUB_OUTPUT" echo "EOF" >> "$GITHUB_OUTPUT" - # Extract resource group name + # Extract resource group name (for integration tests) RG_NAME=$(echo "$OUTPUTS" | jq -r '.resourceGroupName.value // empty') echo "resource_group=$RG_NAME" >> "$GITHUB_OUTPUT" - echo "✅ Deployment succeeded in ${DURATION}s" + # Capture the list of managed resources from the stack — this is the + # authoritative manifest for everything the stack will delete on destroy. + MANAGED=$(echo "$DEPLOY_OUTPUT" | jq -c '[(.resources // .properties.resources // [])[] | {id: .id, status: .status}]') + echo "managed_resources<> "$GITHUB_OUTPUT" + echo "$MANAGED" >> "$GITHUB_OUTPUT" + echo "EOF" >> "$GITHUB_OUTPUT" + + echo "✅ Stack deployed in ${DURATION}s — stackId: $STACK_ID" + echo " Managed resources: $(echo "$MANAGED" | jq 'length')" - name: Run integration tests id: tests @@ -349,25 +374,49 @@ jobs: DEPLOY_DIR="${{ steps.params.outputs.deploy_dir }}" STATUS="${{ steps.deploy.outputs.deploy_status || 'failed' }}" TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ) - - # Create/update state.json - cat > "$DEPLOY_DIR/state.json" < "$DEPLOY_DIR/state.json" - name: Commit deployment state if: always() diff --git a/.github/workflows/git-ape-destroy.exampleyml b/.github/workflows/git-ape-destroy.exampleyml index 1afc7ae..eb0c71e 100644 --- a/.github/workflows/git-ape-destroy.exampleyml +++ b/.github/workflows/git-ape-destroy.exampleyml @@ -131,17 +131,24 @@ jobs: exit 1 fi + # Stacks-only: stackId is the single source of truth. If it's missing + # this deployment wasn't created via Deployment Stacks and can't be + # destroyed by this workflow. + STACK_ID=$(jq -r '.stackId // empty' "$STATE_FILE") + STACK_NAME=$(jq -r '.deploymentId // empty' "$STATE_FILE") RG_NAME=$(jq -r '.resourceGroup // empty' "$STATE_FILE") - if [[ -z "$RG_NAME" ]]; then - echo "::error::No resource group found in state file" + if [[ -z "$STACK_ID" && -z "$STACK_NAME" ]]; then + echo "::error::state.json has no stackId or deploymentId — cannot destroy" echo "found=false" >> "$GITHUB_OUTPUT" exit 1 fi echo "found=true" >> "$GITHUB_OUTPUT" + echo "stack_id=$STACK_ID" >> "$GITHUB_OUTPUT" + echo "stack_name=$STACK_NAME" >> "$GITHUB_OUTPUT" echo "resource_group=$RG_NAME" >> "$GITHUB_OUTPUT" - echo "Will destroy resource group: $RG_NAME" + echo "Will destroy deployment stack: $STACK_NAME (${STACK_ID:-by name})" - name: Azure Login (OIDC) if: steps.state.outputs.found == 'true' @@ -151,135 +158,67 @@ jobs: tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - - name: Build destroy plan + - name: Inventory managed resources id: check if: steps.state.outputs.found == 'true' run: | - RG="${{ steps.state.outputs.resource_group }}" - DEPLOYMENT_ID="${{ matrix.deployment_id }}" - - # Check if resource group exists - EXISTS=$(az group exists --name "$RG") - echo "exists=$EXISTS" >> "$GITHUB_OUTPUT" + STACK_NAME="${{ steps.state.outputs.stack_name }}" - if [[ "$EXISTS" != "true" ]]; then - echo "Resource group $RG does not exist (already deleted?)" + # Read live managed-resource list from the stack itself. + # Stacks are idempotent: if the stack is already gone we record that and exit cleanly. + if ! STACK_JSON=$(az stack sub show --name "$STACK_NAME" --output json 2>/dev/null); then + echo "Stack $STACK_NAME not found (already destroyed?)" + echo "exists=false" >> "$GITHUB_OUTPUT" echo "resource_count=0" >> "$GITHUB_OUTPUT" - echo "sub_count=0" >> "$GITHUB_OUTPUT" exit 0 fi - # Inventory RG resources - RESOURCES=$(az resource list --resource-group "$RG" \ - --query "[].{name:name, type:type, id:id, provisioningState:provisioningState}" \ - --output json 2>/dev/null || echo "[]") - RESOURCE_COUNT=$(echo "$RESOURCES" | jq 'length') + echo "exists=true" >> "$GITHUB_OUTPUT" - echo "resource_count=$RESOURCE_COUNT" >> "$GITHUB_OUTPUT" + RESOURCES=$(echo "$STACK_JSON" | jq -c '[(.resources // [])[] | {id: .id, status: .status}]') + COUNT=$(echo "$RESOURCES" | jq 'length') + + echo "resource_count=$COUNT" >> "$GITHUB_OUTPUT" echo "resources<> "$GITHUB_OUTPUT" echo "$RESOURCES" >> "$GITHUB_OUTPUT" echo "EOF" >> "$GITHUB_OUTPUT" - echo "Resource group $RG has $RESOURCE_COUNT resources" - echo "$RESOURCES" | jq -r '.[] | " - \(.type)/\(.name) (\(.provisioningState))"' - - # Query deployment operations to find subscription-scoped resources - # These are NOT deleted by az group delete (e.g. role assignments, policy assignments) - SUB_RESOURCES="[]" - - OPS=$(az deployment operation sub list \ - --name "$DEPLOYMENT_ID" \ - --query "[?properties.provisioningState=='Succeeded' && properties.targetResource.id != null].properties.targetResource" \ - -o json 2>/dev/null || echo "[]") - - if [[ "$OPS" != "[]" ]]; then - # Find subscription-scoped authorization/policy resources (role assignments, etc.) - # These live outside the RG and survive az group delete - SUB_RESOURCES=$(echo "$OPS" | jq -c '[ - .[] | select( - (.resourceType // "" | test("Microsoft.Authorization|Microsoft.Policy")) and - (.id // "" | test("/resourceGroups/") | not) - ) - ]') - - # Check nested deployments for RG-scoped role assignments too - NESTED_NAMES=$(echo "$OPS" | jq -r '[ - .[] | select(.resourceType == "Microsoft.Resources/deployments") - ] | .[].resourceName // empty') - - for NESTED_NAME in $NESTED_NAMES; do - NESTED_OPS=$(az deployment operation group list \ - --resource-group "$RG" --name "$NESTED_NAME" \ - --query "[?properties.provisioningState=='Succeeded' && properties.targetResource.id != null].properties.targetResource" \ - -o json 2>/dev/null || echo "[]") - - # Role assignments scoped to resources within the RG - NESTED_AUTH=$(echo "$NESTED_OPS" | jq -c '[ - .[] | select( - (.resourceType // "" | test("Microsoft.Authorization")) - ) - ]') - - SUB_RESOURCES=$(jq -n --argjson a "$SUB_RESOURCES" --argjson b "$NESTED_AUTH" '$a + $b') - done - fi - - SUB_COUNT=$(echo "$SUB_RESOURCES" | jq 'length') - - echo "sub_count=$SUB_COUNT" >> "$GITHUB_OUTPUT" - echo "sub_resources<> "$GITHUB_OUTPUT" - echo "$SUB_RESOURCES" >> "$GITHUB_OUTPUT" - echo "EOF" >> "$GITHUB_OUTPUT" - echo "" echo "=== Destroy Plan ===" - echo "Resource group: $RG ($RESOURCE_COUNT resources)" - echo "Subscription-scoped resources: $SUB_COUNT" - if [[ "$SUB_COUNT" -gt 0 ]]; then - echo "$SUB_RESOURCES" | jq -r '.[] | " - \(.resourceType): \(.resourceName) (\(.id))"' - fi + echo "Stack: $STACK_NAME" + echo "Managed resources: $COUNT" + echo "$RESOURCES" | jq -r '.[] | " - \(.id) [\(.status)]"' echo "===================" - - name: Delete subscription-scoped resources - id: destroy_sub - if: steps.check.outputs.exists == 'true' && steps.check.outputs.sub_count != '0' - run: | - echo "🗑️ Deleting subscription-scoped resources first..." - FAILED=0 - - echo '${{ steps.check.outputs.sub_resources }}' | jq -r '.[].id' | while read -r RESOURCE_ID; do - echo " Deleting: $RESOURCE_ID" - if ! az resource delete --ids "$RESOURCE_ID" 2>&1; then - echo "::warning::Failed to delete $RESOURCE_ID" - FAILED=$((FAILED + 1)) - fi - done - - if [[ "$FAILED" -gt 0 ]]; then - echo "::warning::$FAILED subscription-scoped resource(s) failed to delete" - fi - - - name: Delete resource group + - name: Delete deployment stack id: destroy if: steps.check.outputs.exists == 'true' run: | - RG="${{ steps.state.outputs.resource_group }}" - echo "🗑️ Deleting resource group: $RG" - echo "This will block until the resource group is fully deleted..." + STACK_NAME="${{ steps.state.outputs.stack_name }}" + echo "🗑️ Deleting deployment stack: $STACK_NAME" + echo " --action-on-unmanage deleteAll — removes every resource (across RGs / sub scope) the stack manages" + echo " This will block until all managed resources are fully deleted..." START_TIME=$(date +%s) - az group delete --name "$RG" --yes 2>&1 || { + # --bypass-stack-out-of-sync-error: a destroyed run is one-shot; we + # don't need the safety check that protects against stale manifests + # during iterative updates. + if ! az stack sub delete \ + --name "$STACK_NAME" \ + --action-on-unmanage deleteAll \ + --bypass-stack-out-of-sync-error true \ + --yes 2>&1; then echo "destroy_status=failed" >> "$GITHUB_OUTPUT" - echo "::error::Failed to delete resource group $RG" + echo "::error::Failed to delete deployment stack $STACK_NAME" exit 1 - } + fi END_TIME=$(date +%s) DURATION=$((END_TIME - START_TIME)) echo "destroy_status=succeeded" >> "$GITHUB_OUTPUT" echo "destroy_duration=${DURATION}s" >> "$GITHUB_OUTPUT" - echo "✅ Resource group deleted in ${DURATION}s: $RG" + echo "✅ Stack deleted in ${DURATION}s: $STACK_NAME" - name: Update deployment state if: always() && steps.state.outputs.found == 'true' @@ -322,11 +261,11 @@ jobs: if: always() run: | DEPLOY_ID="${{ matrix.deployment_id }}" + STACK="${{ steps.state.outputs.stack_name }}" RG="${{ steps.state.outputs.resource_group }}" STATUS="${{ steps.destroy.outputs.destroy_status }}" DURATION="${{ steps.destroy.outputs.destroy_duration }}" RESOURCE_COUNT="${{ steps.check.outputs.resource_count }}" - SUB_COUNT="${{ steps.check.outputs.sub_count }}" EXISTS="${{ steps.check.outputs.exists }}" RUN_URL="${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}" @@ -334,11 +273,12 @@ jobs: echo "Git-Ape Destroy Summary" echo "============================================" echo "Deployment: $DEPLOY_ID" + echo "Stack: $STACK" echo "Resource Group: $RG" if [[ "$EXISTS" == "false" ]]; then - echo "Result: Already destroyed" + echo "Result: Already destroyed (stack not found)" elif [[ "$STATUS" == "succeeded" ]]; then - echo "Result: ✅ Destroyed ($RESOURCE_COUNT RG resources + $SUB_COUNT subscription-scoped)" + echo "Result: ✅ Destroyed ($RESOURCE_COUNT managed resources)" echo "Duration: $DURATION" else echo "Result: ❌ Failed" @@ -355,16 +295,16 @@ jobs: if [[ -z "$SLACK_WEBHOOK_URL" ]]; then exit 0; fi DEPLOY_ID="${{ matrix.deployment_id }}" - RG="${{ steps.state.outputs.resource_group }}" + STACK="${{ steps.state.outputs.stack_name }}" STATUS="${{ steps.destroy.outputs.destroy_status }}" RUN_URL="${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}" if [[ "$STATUS" == "succeeded" ]]; then EMOJI="🗑️" - MSG="Resource group *$RG* ($DEPLOY_ID) destroyed" + MSG="Deployment stack *$STACK* ($DEPLOY_ID) destroyed" else EMOJI="❌" - MSG="Destroy failed for *$RG* ($DEPLOY_ID)" + MSG="Destroy failed for stack *$STACK* ($DEPLOY_ID)" fi curl -sf -X POST "$SLACK_WEBHOOK_URL" \ diff --git a/.github/workflows/git-ape-plan.exampleyml b/.github/workflows/git-ape-plan.exampleyml index a7d6c35..5467996 100644 --- a/.github/workflows/git-ape-plan.exampleyml +++ b/.github/workflows/git-ape-plan.exampleyml @@ -353,16 +353,21 @@ jobs: tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - - name: Validate template + - name: Validate template (stack) id: validate if: steps.azure_login.outcome == 'success' run: | - echo "### Validating ARM template..." + echo "### Validating deployment stack..." - RESULT=$(az deployment sub validate \ + # az stack sub validate mirrors az deployment sub validate but also + # verifies stack-specific settings (action-on-unmanage, deny settings). + RESULT=$(az stack sub validate \ + --name "${{ matrix.deployment_id }}" \ --location "${{ steps.params.outputs.location }}" \ --template-file "${{ steps.params.outputs.deploy_dir }}/template.json" \ --parameters @"${{ steps.params.outputs.deploy_dir }}/parameters.json" \ + --action-on-unmanage deleteAll \ + --deny-settings-mode none \ --output json 2>&1) || true # Guard against non-JSON output (e.g. auth/CLI errors) — jq exits non-zero @@ -388,6 +393,11 @@ jobs: id: whatif if: steps.validate.outputs.validation_status == 'passed' run: | + # NOTE: Deployment Stacks don't yet support what-if + # (see https://learn.microsoft.com/azure/azure-resource-manager/bicep/deployment-stacks#known-issues). + # We fall back to `az deployment sub what-if` against the underlying + # ARM template — this accurately previews resource changes even though + # it doesn't model the stack wrapper itself. WHATIF_OUTPUT=$(az deployment sub what-if \ --location "${{ steps.params.outputs.location }}" \ --template-file "${{ steps.params.outputs.deploy_dir }}/template.json" \ diff --git a/docs/DEPLOYMENT_STATE.md b/docs/DEPLOYMENT_STATE.md index 8f3db7b..11d92ed 100644 --- a/docs/DEPLOYMENT_STATE.md +++ b/docs/DEPLOYMENT_STATE.md @@ -75,11 +75,55 @@ Contains deployment tracking information: - `gathering-requirements` - Collecting user input - `generating-template` - Creating ARM template - `awaiting-confirmation` - Waiting for user approval -- `deploying` - Deployment in progress +- `deploying` - Deployment in progress (stack create/update) - `testing` - Running integration tests -- `succeeded` - Completed successfully +- `succeeded` - Stack deployed successfully - `failed` - Deployment failed -- `rolled-back` - Resources removed after failure +- `destroy-requested` - Teardown requested (triggers `git-ape-destroy.yml`) +- `destroyed` - Deployment stack deleted with `--action-on-unmanage deleteAll` +- `already-destroyed` - Destroy ran but the stack was already gone (idempotent path) +- `destroy-failed` - Destroy workflow errored + +### state.json (Deployment Stack manifest) + +Written by `git-ape-deploy.yml` after each deploy. This is the **single source of truth** that `git-ape-destroy.yml` reads to tear down a deployment. + +Every Git-Ape deployment is an [Azure Deployment Stack](https://learn.microsoft.com/azure/azure-resource-manager/bicep/deployment-stacks) at subscription scope, created with `--action-on-unmanage deleteAll`. The stack owns every resource across every scope (RG, multiple RGs, sub-scope role/policy assignments, …), so destroy is one idempotent call regardless of how the template evolves. + +```json +{ + "schemaVersion": "1.0", + "deploymentId": "deploy-20260218-143022", + "timestamp": "2026-02-18T14:33:00Z", + "status": "succeeded", + "duration": "216s", + "subscription": "ece04c6f-d78a-4c30-b05e-fd68b5733289", + "location": "francecentral", + "project": "arna1", + "environment": "dev", + "resourceGroup": "rg-arna1-dev-francecentral", + "stackId": "/subscriptions/ece.../providers/Microsoft.Resources/deploymentStacks/deploy-20260218-143022", + "managedResources": [ + { "id": "/subscriptions/.../resourceGroups/rg-arna1-dev-francecentral", "status": "succeeded" }, + { "id": "/subscriptions/.../resourceGroups/.../Microsoft.KeyVault/vaults/kv-arna1-dev-frc", "status": "succeeded" }, + { "id": "/subscriptions/.../resourceGroups/.../Microsoft.Network/virtualNetworks/vnet-arna1-dev-francecentral", "status": "succeeded" } + ], + "triggeredBy": "arnaudlh", + "triggerEvent": "issue_comment", + "runId": "1234567890", + "runUrl": "https://github.com/Azure/git-ape/actions/runs/1234567890" +} +``` + +| Field | Purpose | +|-------|---------| +| `schemaVersion` | State file format version (`1.0` = Stacks-based) | +| `stackId` | Resource id of the deployment stack — **destroy reads this** | +| `managedResources[]` | Snapshot of what the stack owns at deploy time (human-readable manifest) | +| `resourceGroup` | Primary RG name (for integration tests / portal links) | +| `subscription`, `location`, `project`, `environment` | Deployment context | + +**Why stacks and not just `az deployment sub create`?** See [Azure/git-ape#30](https://github.com/Azure/git-ape/issues/30). TL;DR: plain subscription deployments leave orphans behind when a deployment spans multiple RGs or creates sub-scope resources via nested templates. Stacks handle multi-scope destruction natively with a single `az stack sub delete --action-on-unmanage deleteAll`. ### requirements.json