Skip to content

feat: Add multi-cloud FOCUS test data generator for FinOps Hub#2006

Open
FallenHoot wants to merge 1 commit intomicrosoft:devfrom
FallenHoot:feature/multi-cloud-test-data-generator
Open

feat: Add multi-cloud FOCUS test data generator for FinOps Hub#2006
FallenHoot wants to merge 1 commit intomicrosoft:devfrom
FallenHoot:feature/multi-cloud-test-data-generator

Conversation

@FallenHoot
Copy link

Add multi-cloud FOCUS test data generator for FinOps Hub

Description

Adds Generate-MultiCloudTestData.ps1 — a PowerShell script that generates synthetic, multi-cloud, FOCUS-compliant cost data for testing and validating FinOps Hub deployments end-to-end.

Closes #2005

What's Included

  • Generate-MultiCloudTestData.ps1 (~1,430 lines) — Self-contained script that generates FOCUS 1.0–1.3 synthetic cost data for Azure, AWS, GCP, and DataCenter providers

Why This Script Is Needed

Testing a FinOps Hub deployment today requires real Cost Management export data. This script fills that gap by generating realistic synthetic data that:

  1. Covers all 4 supported cloud providers with provider-specific conventions (Azure resource IDs, AWS ARNs, GCP resource paths)
  2. Populates every column referenced by FinOps Hub dashboard KQL queries
  3. Simulates real-world patterns: commitment discounts (Reservations + Savings Plans), Azure Hybrid Benefit, spot/dynamic pricing, marketplace purchases, negotiated discounts, and tag coverage variation
  4. Generates data with proper Cost Management manifest.json files for ingestion pipeline compatibility
  5. Optionally uploads to Azure Storage and manages ADF triggers

Key Features

Feature Details
FOCUS compliance All mandatory + conditional FOCUS columns (v1.0–1.3)
Persistent identities Resources, billing accounts, subscriptions consistent across days
Budget scaling Costs scaled to target budget via Python/pandas
Memory-safe Streams rows daily to CSV, avoids OOM on 500K+ row datasets
Output formats Parquet (pyarrow), CSV, or both
Upload support Uploads to msexports + ingestion containers with proper blob paths

Testing

Tested with:

  • Default settings (500K rows, 6 months, all providers, $500K budget)
  • Single provider mode (Azure-only, 200K rows)
  • Full pipeline (generate → upload → ADF trigger → ADX ingestion → dashboard validation)
  • FOCUS versions 1.0, 1.2, and 1.3

Prerequisites

  • PowerShell 7+
  • Python 3 with pandas and pyarrow (for Parquet conversion)
  • Azure CLI (for upload functionality)

Checklist

  • Script follows FOCUS specification conventions
  • Microsoft copyright header included
  • Comment-based help with SYNOPSIS, DESCRIPTION, PARAMETERS, EXAMPLES
  • No hardcoded paths or environment-specific references
  • Tested with 498K+ rows successfully ingested into FinOps Hub

Add Generate-MultiCloudTestData.ps1 that generates synthetic, multi-cloud
FOCUS-compliant cost data (v1.0-1.3) for testing FinOps Hub deployments.

Supports Azure, AWS, GCP, and DataCenter providers with realistic data
including commitment discounts, Azure Hybrid Benefit, spot pricing,
marketplace purchases, tag coverage variation, and budget scaling.

Generates up to 500K+ rows with Parquet/CSV output and optional Azure
Storage upload with ADF trigger management.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs: Review 👀 PR that is ready to be reviewed Tool: FinOps hubs Data pipeline solution Type: Feature 💎 Idea to improve the product

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Multi-Cloud FOCUS Test Data Generator Script

7 participants