A Meltano utility extension that triggers and
monitors Power BI semantic-model refreshes as a step in a Meltano
pipeline. Built on the Meltano EDK and
azure-identity.
Forked from lgrosjean/powerbi-ext
and extended with async refresh handling, polling, status lookup, and
history listing — see the Bug fixes vs upstream section near the end.
Today many teams rely on Power BI Service's scheduled-refresh UI, which is decoupled from data-pipeline completion: refresh timing drifts from when data lands. This utility moves the refresh trigger into the Meltano pipeline itself so the orchestrated path is extract → load → transform → refresh — and a refresh failure fails the pipeline.
Add the utility to your Meltano project, pinned to a release tag:
# meltano.yml
plugins:
utilities:
- name: powerbi
variant: matatika
pip_url: git+https://github.com/Matatika/utility-powerbi.git@v0.1.0Then:
meltano install utility powerbiThe utility authenticates non-interactively using a Microsoft Entra (Azure AD) service principal. Set this up once per tenant:
- Register an application in Azure AD. Portal → Azure Active Directory → App registrations → New registration. Capture the resulting Tenant ID and Client (Application) ID.
- Create a client secret. App registration → Certificates & secrets → New client secret. Capture the Client Secret value — it is shown only at creation.
- Enable service-principal API access in Power BI. Power BI Admin Portal → Tenant settings → "Service principals can use Power BI APIs". Either enable for the whole tenant or for a specific security group containing the principal.
- Grant the principal access to your workspaces. In each Power BI workspace whose datasets you intend to refresh, add the service principal as a Member (or higher).
- Capture workspace + dataset IDs. Open the dataset in Power BI
Service; the URL contains both:
https://app.powerbi.com/groups/{WORKSPACE_ID}/datasets/{DATASET_ID}/...
| Setting | Env var | Required | Description |
|---|---|---|---|
tenant_id |
POWERBI_TENANT_ID |
yes | Azure AD tenant ID. |
client_id |
POWERBI_CLIENT_ID |
yes | Azure AD application (client) ID. |
client_secret |
POWERBI_CLIENT_SECRET |
yes | Azure AD application client secret. |
workspace_id |
POWERBI_WORKSPACE_ID |
yes | Power BI workspace (group) ID. |
dataset_id |
POWERBI_DATASET_ID |
yes | Power BI dataset (semantic model) ID. |
api_url |
POWERBI_API_URL |
no | Override API base URL (default https://api.powerbi.com/v1.0/myorg). Useful for sovereign clouds (e.g. https://api.powerbigov.us/v1.0/myorg). |
Configure interactively:
meltano config powerbi set --interactiveOr via environment variables in your shell / .env:
export POWERBI_TENANT_ID=...
export POWERBI_CLIENT_ID=...
export POWERBI_CLIENT_SECRET=...
export POWERBI_WORKSPACE_ID=...
export POWERBI_DATASET_ID=...meltano invoke powerbi:refresh [--wait/--no-wait] [--poll-interval=30] [--timeout=3600] [--notify=NoNotification]| Flag | Default | Description |
|---|---|---|
--wait / --no-wait |
--wait |
Block until the refresh reaches a terminal status. |
--poll-interval |
30 | Seconds between status polls when waiting. |
--timeout |
3600 | Max seconds to wait before exiting with timeout. |
--notify |
NoNotification |
Power BI notifyOption: one of NoNotification, MailOnCompletion, MailOnFailure. |
The request ID is always echoed to stdout on trigger, so it can be
captured by an outer script even when --no-wait is used.
Exit codes:
| Code | Meaning |
|---|---|
| 0 | Refresh completed successfully |
| 1 | Refresh ended in Failed or Disabled terminal state |
| 2 | Refresh did not reach a terminal state within --timeout |
| 3 | Auth or HTTP error (credentials, permissions, network, 4xx/5xx) |
meltano invoke powerbi:status [--request-id=<id>]Without --request-id, returns the most recent refresh from history.
Output is JSON containing requestId, status, startTime, endTime,
refreshType, and serviceExceptionJson when applicable.
Exit codes mirror refresh (0 / 1 / 3). Empty history returns exit 3.
meltano invoke powerbi:history [--top=10]Returns a JSON array of recent refreshes for the configured dataset.
--top caps the count (Power BI accepts up to 200).
Exit 0 on success, 3 on auth/HTTP error.
Append powerbi:refresh to the actions of any Meltano pipeline that
runs after data has landed:
# meltano.yml
schedules:
- name: daily_refresh
interval: '@daily'
job: ingest_and_refresh
jobs:
- name: ingest_and_refresh
tasks:
- tap-postgres target-snowflake
- dbt-snowflake:run
- powerbi:refreshA non-zero exit from powerbi:refresh fails the Meltano job.
refresh (trigger POST) and status polls (GET) automatically retry on
transient failures:
| Condition | Retried? |
|---|---|
| HTTP 429 (rate limit) | yes |
| HTTP 500 / 502 / 503 / 504 | yes |
requests.ConnectionError (DNS, TCP, TLS handshake) |
yes |
requests.Timeout (read / connect timeout) |
yes |
| HTTP 4xx (other) — auth, not found, bad request | no |
| Any other exception | no |
Strategy: up to 3 attempts per call, exponential backoff with jitter (initial 2s, max 30s). 4xx failures are surfaced immediately because they indicate a config or permissions problem — retrying just delays the inevitable failure.
history (operator command) is not auto-retried; re-run it by hand if it
flakes.
A successful retry adds wall-clock time to refresh (up to ~14s for two
backoffs). Account for this when sizing --timeout.
| Symptom | Cause |
|---|---|
Exit 3 with 401 Unauthorized |
Service principal not enabled in Power BI Admin Portal, or not added to the workspace. Re-check prerequisites 3 + 4. |
Exit 3 with 403 Forbidden on refresh |
Principal has read access but not write — make it a Member, not Viewer. |
Exit 3 with 404 Not Found |
Wrong workspace_id or dataset_id, or the dataset is in "My workspace" (personal, not API-accessible). |
400 Bad Request with Operation in progress |
Power BI allows only one refresh in progress per dataset at a time. Wait for the running refresh to finish or check powerbi:history. |
Exit 3 with 429 Too Many Requests |
Shared-capacity datasets are limited to 8 refreshes/day. Either move to Premium capacity or reduce schedule frequency. |
| Exit 2 (timeout) | Refresh is still running but exceeded --timeout. Bump --timeout, or use --no-wait and poll with powerbi:status. |
poetry install
poetry run pytest
# Verify the CLI loads:
poetry run powerbi-extension --help
poetry run powerbi-extension describe --format=yaml
# Invoke against a real tenant (sandbox recommended):
export POWERBI_TENANT_ID=... POWERBI_CLIENT_ID=... POWERBI_CLIENT_SECRET=...
export POWERBI_WORKSPACE_ID=... POWERBI_DATASET_ID=...
poetry run powerbi-extension refresh --no-wait
poetry run powerbi-extension status
poetry run powerbi-extension history --top 5The upstream is dormant since Oct 2023 and never tagged a release. This fork corrects the following functional issues:
refreshraised on every success. Upstream checkedstatus_code != 200; Power BI's enhanced refresh API returns202 Accepted.- Wrong header for
requestId. Upstream readres.headers["RequestId"], which Power BI does not emit. Fixed to parse theLocationheader path tail withx-ms-request-idas fallback. - Settings were declared but never read.
workspace_idanddataset_idwere declared inmeltano.ymlbut the code only accepted them as CLI args. Fixed to read fromPOWERBI_WORKSPACE_ID/POWERBI_DATASET_IDenv vars (which Meltano populates from settings). - Mail spam by default. Upstream defaulted
notifyOptiontoMailOnCompletion. Changed toNoNotificationfor unattended ETL; callers can opt in via--notify. describereported a phantom command. Fixed to list the three real commands.- No status / history / polling. All three added as part of the trigger-and-wait flow required for pipeline integration.
Apache-2.0. Copyright 2023 Leo Grosjean, 2026 Matatika Ltd. See LICENSE.