Add Azure Batch scalability example#82
Draft
pstreef wants to merge 14 commits into
Draft
Conversation
Adds GCP Batch and Azure Batch alongside the existing AWS Batch example, all VM-based to avoid known Kubernetes issues with resource-intensive LST builds.
Address issues from spec review: correct GCP Terraform resource name (google_batch_job), add Cloud Workflows as scheduling intermediary, fix Azure task submission pattern, add missing files to directory structure (TROUBLESHOOTING.md, terraform.tfvars.example), document CSV download and container registry per platform, note existing chunk.sh bug to fix during migration.
9-task plan covering AWS migration, GCP Batch, Azure Batch, top-level README, and root README updates. Reviewed and fixed: GCP chunk logic moved to Workflow, Azure task submission loop, pool image SKU match, timestamp lifecycle, NSG variable.
Restructure for multi-platform support. Fixes in chunk.sh: - Use $local_csv_file instead of $csv_file for wc -l (fails on S3 URLs) - Fix off-by-one in seq step ($chunk_size, not $chunk_size + 1) - Add set -euo pipefail Also add terraform.tfvars.example and update relative paths in README.
Explains the shared architecture pattern across AWS/GCP/Azure batch services and documents why VM-based approaches are recommended over Kubernetes based on real customer experiences.
Terraform IaC with Cloud Workflows for job orchestration, Cloud Scheduler for cron triggers, Secret Manager for credentials, and service accounts with least-privilege IAM. Uses Compute Engine VMs (n2-standard-4) with auto-scaling to zero.
Terraform IaC with Azure Automation for scheduling, Key Vault for secrets, managed identity for passwordless auth, and auto-scaling pool with container support. Uses Standard_D4s_v5 VMs.
Reflect the new AWS/GCP/Azure batch options in the scalability stage description, directory tree, and comparison table.
Remove docs/superpowers/ directory containing internal AI workflow references that shouldn't be in a public repo. Soften the K8s cost inefficiency claim to cite industry reports and use "order of magnitude" instead of specific multiplier.
- Azure: runbook now fetches secrets from Key Vault and passes them as env vars to chunk task, which forwards them to processor tasks - Azure: chunk.sh passes --account-endpoint for az batch auth - Azure: add acr_resource_group_name variable for cross-RG ACR - Azure: remove unused disk_size_gb variable - GCP: fix workflow service_account to use .email not .id - AWS: add empty CSV guard for consistency with Azure/GCP
- GCP: replace math.ceil with integer ceiling division (not available in Cloud Workflows) - GCP: move jobId computation to init step (yamlencode doesn't evaluate expressions in connector args) - GCP: add roles/artifactregistry.reader for batch task SA to pull container images - AWS: fix reference to nonexistent ingest_job_definition (should be processor_job_definition) - Azure: remove unsupported tags attribute on azurerm_batch_pool - AWS: fix chunk.sh to use $local_csv_file instead of $csv_file for line count
Prevents the azurerm provider from trying to register resource providers at the subscription level, which fails for users without subscription-level permissions.
Azure Batch example needs integration testing which requires elevated Azure RBAC permissions. Splitting it out to unblock the GCP example which is tested and ready.
Azure Batch example with full Terraform IaC: - Azure Batch pool with auto-scaling and container support - Automation Account + Runbook for job orchestration - Key Vault integration for secrets - Managed Identity for authentication Terraform validates but not yet integration-tested (requires Contributor RBAC on the target resource group).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Solution
Add an Azure Batch example with full Terraform IaC:
Follows the same chunk/processor fan-out pattern as AWS and GCP.
Pending
Not yet integration-tested — requires Contributor role on an Azure resource group to create Batch Account, Automation Account, and Key Vault access policies.