Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 26 additions & 14 deletions .github/agents/azure-resource-deployer.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,21 +108,33 @@ Use mcp_azure_mcp_search with "deploy" intent to execute template deployment

**Option B: Azure CLI (Fallback)**

**Always use subscription-level deployment** — the ARM template includes resource group creation, so we deploy at subscription scope:
**Always deploy as an Azure Deployment Stack at subscription scope** — the ARM template includes resource group creation, and Stacks give us idempotent multi-scope lifecycle management with a single destroy call:

```bash
# Subscription-level deployment (creates RG + all resources atomically)
az deployment sub create \
# Subscription-scope Deployment Stack (creates RG + all resources atomically,
# tracked as a single lifecycle unit).
az stack sub create \
--name "{deployment-id}" \
--location {location} \
--template-file {template.json} \
--parameters @{parameters.json} \
--action-on-unmanage deleteAll \
--deny-settings-mode none \
--description "Git-Ape deployment {deployment-id}" \
--tags "managedBy=git-ape" "deploymentId={deployment-id}" \
--yes \
--output json
```

**DO NOT use `az deployment group create`** — our templates always include the resource group as a resource. Subscription-level deployment handles everything in one command.
**Why Stacks (and not `az deployment sub create`):**
- The stack is the single unit of lifecycle — one create, one update, one destroy.
- `--action-on-unmanage deleteAll` guarantees destroy removes every managed resource across every scope (subscription, multiple RGs, sub-scope role/policy assignments) in one synchronous call.
- No orphans, idempotent re-runs, no soft-deleted surprises hiding in the subscription after an RG-delete.
- See [Azure/git-ape#30](https://github.com/Azure/git-ape/issues/30) for the rationale.

Capture the deployment operation ID for tracking.
**DO NOT use `az deployment group create` or `az deployment sub create`** — always go through the stack.

Capture the `stackId` from the response — it becomes the single source of truth stored in `state.json` for the destroy workflow.

### 3. Monitor Progress

Expand Down Expand Up @@ -330,16 +342,16 @@ if [[ "$USER_CHOICE" == "A" ]]; then
read CONFIRMATION

if [[ "$CONFIRMATION" == "confirm rollback" ]]; then
# Delete resources
az resource delete --ids {resource-id-1} {resource-id-2}

# If RG was created new, delete it
if [[ "$RG_NEW" == "true" ]]; then
az group delete --name {rg-name} --yes --no-wait
fi
# Delete the deployment stack — this removes every managed resource
# across all scopes (RGs, sub-scope role assignments, etc.) in one call.
az stack sub delete \
--name "{deployment-id}" \
--action-on-unmanage deleteAll \
--bypass-stack-out-of-sync-error true \
--yes

# Log rollback
echo "Rollback completed" >> .azure/deployments/{deployment-id}/deployment.log
echo "Rollback completed (stack deleted)" >> .azure/deployments/{deployment-id}/deployment.log
fi
fi
```
Expand Down
21 changes: 14 additions & 7 deletions .github/agents/azure-template-generator.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ see [git-ape.agent.md](git-ape.agent.md).
- Resource Group is a `Microsoft.Resources/resourceGroups` resource inside the template
- Other resources go inside a nested `Microsoft.Resources/deployments` with `"resourceGroup"` property
- Use `subscriptionResourceId()` for RG references, regular `resourceId()` inside nested
- Deploy with `az deployment sub create` (not `az deployment group create`)
- Deploy with `az stack sub create --action-on-unmanage deleteAll` (not `az deployment group create` or `az deployment sub create`)
- `uniqueString()` uses `subscription().subscriptionId` instead of `resourceGroup().id`

**Nested Template Requirements:**
Expand Down Expand Up @@ -660,25 +660,32 @@ After showing the preview, provide the complete ARM template:

## Deployment Commands

**Azure CLI (Subscription-level deployment):**
**Always deploy as an Azure Deployment Stack at subscription scope.** Stacks track every resource the template creates (across every scope) as a single lifecycle unit, so destroy is one idempotent call. See [Azure/git-ape#30](https://github.com/Azure/git-ape/issues/30).

**Azure CLI (Deployment Stack at subscription scope):**
```bash
az deployment sub create \
az stack sub create \
--name {deployment-id} \
--location {location} \
--template-file template.json \
--parameters @parameters.json
--parameters @parameters.json \
--action-on-unmanage deleteAll \
--deny-settings-mode none \
--yes
```

**PowerShell:**
```powershell
New-AzSubscriptionDeployment `
New-AzSubscriptionDeploymentStack `
-Name {deployment-id} `
-Location {location} `
-TemplateFile template.json `
-TemplateParameterFile parameters.json
-TemplateParameterFile parameters.json `
-ActionOnUnmanage deleteAll `
-DenySettingsMode none
```

**Note:** We use subscription-level deployments so the resource group is created as part of the template. No need to create the RG separately.
**Note:** We deploy as a **subscription-scope Deployment Stack** so the resource group is created as part of the template and the whole deployment becomes a single lifecycle unit. Destroy with `az stack sub delete --action-on-unmanage deleteAll`.
````

## Constraints
Expand Down
6 changes: 3 additions & 3 deletions .github/agents/git-ape.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -354,12 +354,12 @@ The deployment plan MUST start with a clear "Target Environment" table:
**Delegate to:** `azure-resource-deployer`

The deployer will:
- Execute the ARM template as a **subscription-level deployment** (`az deployment sub create`)
- The ARM template includes resource group creation — everything deploys atomically
- Execute the ARM template as a **subscription-scope Azure Deployment Stack** (`az stack sub create --action-on-unmanage deleteAll`)
- The ARM template includes resource group creation — everything deploys atomically, tracked as a single lifecycle unit
- Monitor deployment progress in real-time
- Handle any deployment failures
- Verify resource creation via Azure Resource Graph
- Capture deployment outputs (resource IDs, endpoints, etc.)
- Capture deployment outputs (resource IDs, endpoints, etc.) and the `stackId` for the destroy workflow

**Deployment Monitoring:** Always poll deployment state every **30 seconds** using `sleep 30` between checks. No exponential backoff — use a fixed 30-second interval for all resources regardless of type or expected duration. Check both the top-level deployment and nested deployment statuses on every poll.

Expand Down
31 changes: 22 additions & 9 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ Git-Ape provides three GitHub Actions workflows under `.github/workflows/`:
**What it does:**
1. Detects which deployment directories changed in the PR
2. Logs into Azure via OIDC
3. Validates each ARM template (`az deployment sub validate`)
3. Validates each ARM template (`az stack sub validate`)
4. Runs what-if analysis (`az deployment sub what-if`)
5. Reads the architecture diagram from the deployment directory
6. Posts a detailed plan as a **PR comment** (validation result + what-if + architecture)
Expand All @@ -192,12 +192,17 @@ Git-Ape provides three GitHub Actions workflows under `.github/workflows/`:
**What it does:**
1. Detects deployment directories to execute
2. Logs into Azure via OIDC
3. Validates the template one more time
4. Runs `az deployment sub create` to deploy
3. Validates the template one more time (`az stack sub validate`)
4. Deploys as an **Azure Deployment Stack** (`az stack sub create --action-on-unmanage deleteAll`)
5. Runs integration tests (lists deployed resources, tests HTTP endpoints)
6. Commits `state.json` with deployment result back to the repo
6. Commits `state.json` (including `stackId` and `managedResources[]`) back to the repo
7. Posts deployment result as a PR comment (on `/deploy` trigger)

**Why Deployment Stacks:**
- The stack is the single unit of lifecycle — create, update, and destroy operate on it, not on the underlying RGs.
- `deleteAll` on unmanage guarantees destruction cleans up every managed resource across every scope (subscription, multiple RGs, role/policy assignments at sub scope) in one call. No orphans, idempotent re-runs.
- See [Azure/git-ape#30](https://github.com/Azure/git-ape/issues/30) for the rationale.

**Requires:** GitHub environment `azure-deploy` (for environment protection rules)

**Safety:**
Expand All @@ -213,11 +218,15 @@ Git-Ape provides three GitHub Actions workflows under `.github/workflows/`:

**What it does:**
1. Detects deployments where `metadata.json` status changed to `destroy-requested`
2. Reads `state.json` to find the resource group name
3. Inventories all resources in the resource group
4. Deletes the resource group (`az group delete` — synchronous, waits for completion)
2. Reads `state.json` to find the deployment stack name (`deploymentId`) and `stackId`
3. Calls `az stack sub show` to inventory the stack's managed resources
4. Calls `az stack sub delete --action-on-unmanage deleteAll` — removes every resource the stack manages, across all scopes, in one synchronous call
5. Updates `state.json` and `metadata.json` with `destroyed` status and commits to repo

**Idempotency:**
- If the stack is already gone, the workflow records `already-destroyed` and succeeds cleanly.
- No RG-delete fallback path, no subscription-scope resource sweep — Stacks handle multi-scope destruction natively.

**Destroy flow:**
1. Agent or user creates a PR that sets `metadata.json` status to `destroy-requested`
2. PR is reviewed and approved (human gate for destructive action)
Expand Down Expand Up @@ -413,10 +422,14 @@ jobs:
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Deploy
run: |
az deployment sub create \
az stack sub create \
--name ${{ env.DEPLOYMENT_ID }} \
--location ${{ env.LOCATION }} \
--template-file .azure/deployments/${{ env.DEPLOYMENT_ID }}/template.json \
--parameters @.azure/deployments/${{ env.DEPLOYMENT_ID }}/parameters.json
--parameters @.azure/deployments/${{ env.DEPLOYMENT_ID }}/parameters.json \
--action-on-unmanage deleteAll \
--deny-settings-mode none \
--yes
```

**Transitioning from Service Principal secrets to OIDC:**
Expand Down
111 changes: 80 additions & 31 deletions .github/workflows/git-ape-deploy.exampleyml
Original file line number Diff line number Diff line change
Expand Up @@ -195,12 +195,15 @@ jobs:
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

- name: Validate before deploy
- name: Validate before deploy (stack)
run: |
az deployment sub validate \
az stack sub validate \
--name "${{ matrix.deployment_id }}" \
--location "${{ steps.params.outputs.location }}" \
--template-file "${{ steps.params.outputs.deploy_dir }}/template.json" \
--parameters @"${{ steps.params.outputs.deploy_dir }}/parameters.json" \
--action-on-unmanage deleteAll \
--deny-settings-mode none \
--output json

- name: Run Microsoft Defender for DevOps template analyzer
Expand Down Expand Up @@ -234,17 +237,26 @@ jobs:
echo "Security scan passed — no errors found"
fi

- name: Deploy to Azure
- name: Deploy to Azure (Deployment Stack)
id: deploy
run: |
echo "🚀 Starting deployment: ${{ matrix.deployment_id }}"
STACK_NAME="${{ matrix.deployment_id }}"
echo "🚀 Starting stack deployment: $STACK_NAME"
START_TIME=$(date +%s)

DEPLOY_OUTPUT=$(az deployment sub create \
--name "${{ matrix.deployment_id }}" \
# Create/update the subscription-scope Deployment Stack.
# --action-on-unmanage deleteAll binds the whole stack (RG + contents)
# to a single lifecycle so destroy is idempotent across all scopes.
DEPLOY_OUTPUT=$(az stack sub create \
--name "$STACK_NAME" \
--location "${{ steps.params.outputs.location }}" \
--template-file "${{ steps.params.outputs.deploy_dir }}/template.json" \
--parameters @"${{ steps.params.outputs.deploy_dir }}/parameters.json" \
--action-on-unmanage deleteAll \
--deny-settings-mode none \
--description "Git-Ape deployment $STACK_NAME" \
--tags "managedBy=git-ape" "deploymentId=$STACK_NAME" \
--yes \
--output json 2>&1)

EXIT_CODE=$?
Expand All @@ -260,27 +272,40 @@ jobs:
echo "EOF" >> "$GITHUB_OUTPUT"
echo ""
echo "=========================================="
echo "❌ DEPLOYMENT FAILED"
echo "❌ STACK DEPLOYMENT FAILED"
echo "=========================================="
echo "$DEPLOY_OUTPUT"
echo "=========================================="
echo "::error::Deployment failed — see output above for details"
echo "::error::Stack deployment failed — see output above for details"
exit 1
fi

echo "deploy_status=succeeded" >> "$GITHUB_OUTPUT"

# Extract outputs
OUTPUTS=$(echo "$DEPLOY_OUTPUT" | jq -r '.properties.outputs // {}')
# Capture the stack resource id — this is the single source of truth
# for destroy. Stored in state.json as `stackId`.
STACK_ID=$(echo "$DEPLOY_OUTPUT" | jq -r '.id // empty')
echo "stack_id=$STACK_ID" >> "$GITHUB_OUTPUT"

# Extract template outputs from the stack
OUTPUTS=$(echo "$DEPLOY_OUTPUT" | jq -r '.outputs // .properties.outputs // {}')
echo "deploy_outputs<<EOF" >> "$GITHUB_OUTPUT"
echo "$OUTPUTS" >> "$GITHUB_OUTPUT"
echo "EOF" >> "$GITHUB_OUTPUT"

# Extract resource group name
# Extract resource group name (for integration tests)
RG_NAME=$(echo "$OUTPUTS" | jq -r '.resourceGroupName.value // empty')
echo "resource_group=$RG_NAME" >> "$GITHUB_OUTPUT"

echo "✅ Deployment succeeded in ${DURATION}s"
# Capture the list of managed resources from the stack — this is the
# authoritative manifest for everything the stack will delete on destroy.
MANAGED=$(echo "$DEPLOY_OUTPUT" | jq -c '[(.resources // .properties.resources // [])[] | {id: .id, status: .status}]')
echo "managed_resources<<EOF" >> "$GITHUB_OUTPUT"
echo "$MANAGED" >> "$GITHUB_OUTPUT"
echo "EOF" >> "$GITHUB_OUTPUT"

echo "✅ Stack deployed in ${DURATION}s — stackId: $STACK_ID"
echo " Managed resources: $(echo "$MANAGED" | jq 'length')"

- name: Run integration tests
id: tests
Expand Down Expand Up @@ -349,25 +374,49 @@ jobs:
DEPLOY_DIR="${{ steps.params.outputs.deploy_dir }}"
STATUS="${{ steps.deploy.outputs.deploy_status || 'failed' }}"
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

# Create/update state.json
cat > "$DEPLOY_DIR/state.json" <<EOF
{
"deploymentId": "${{ matrix.deployment_id }}",
"timestamp": "$TIMESTAMP",
"status": "$STATUS",
"duration": "${{ steps.deploy.outputs.deploy_duration }}",
"subscription": "${{ secrets.AZURE_SUBSCRIPTION_ID }}",
"location": "${{ steps.params.outputs.location }}",
"project": "${{ steps.params.outputs.project }}",
"environment": "${{ steps.params.outputs.environment }}",
"resourceGroup": "${{ steps.deploy.outputs.resource_group }}",
"triggeredBy": "${{ github.actor }}",
"triggerEvent": "${{ github.event_name }}",
"runId": "${{ github.run_id }}",
"runUrl": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}
EOF
STACK_ID='${{ steps.deploy.outputs.stack_id }}'
MANAGED='${{ steps.deploy.outputs.managed_resources }}'
MANAGED=${MANAGED:-[]}

# state.json schema v1 — Deployment Stacks edition.
# `stackId` is the single source of truth for destroy.
# `managedResources` is a snapshot captured at deploy time so the
# repo retains a human-readable manifest of what the stack owns.
jq -n \
--arg schemaVersion "1.0" \
--arg deploymentId "${{ matrix.deployment_id }}" \
--arg timestamp "$TIMESTAMP" \
--arg status "$STATUS" \
--arg duration "${{ steps.deploy.outputs.deploy_duration }}" \
--arg subscription "${{ secrets.AZURE_SUBSCRIPTION_ID }}" \
--arg location "${{ steps.params.outputs.location }}" \
--arg project "${{ steps.params.outputs.project }}" \
--arg environment "${{ steps.params.outputs.environment }}" \
--arg resourceGroup "${{ steps.deploy.outputs.resource_group }}" \
--arg stackId "$STACK_ID" \
--argjson managedResources "$MANAGED" \
--arg triggeredBy "${{ github.actor }}" \
--arg triggerEvent "${{ github.event_name }}" \
--arg runId "${{ github.run_id }}" \
--arg runUrl "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}" \
'{
schemaVersion: $schemaVersion,
deploymentId: $deploymentId,
timestamp: $timestamp,
status: $status,
duration: $duration,
subscription: $subscription,
location: $location,
project: $project,
environment: $environment,
resourceGroup: $resourceGroup,
stackId: (if $stackId == "" then null else $stackId end),
managedResources: $managedResources,
triggeredBy: $triggeredBy,
triggerEvent: $triggerEvent,
runId: $runId,
runUrl: $runUrl
}' > "$DEPLOY_DIR/state.json"

- name: Commit deployment state
if: always()
Expand Down
Loading