Batch import Airflow settings to speed up dev start#2078
Batch import Airflow settings to speed up dev start#2078
Conversation
Previously, `AddVariables`, `AddConnections`, and `AddPools` each ran a
separate `docker exec` (or `bash -c` for standalone mode) per object.
Every call paid the full Airflow CLI startup cost (~2-5s), so a project
with 10 connections, 5 variables, and 2 pools would spend 1-2+ minutes
just on settings injection.
Replace the per-item loops with Airflow's native batch import commands
(`airflow variables import`, `airflow connections import --overwrite`,
`airflow pools import`) for Airflow 2+. A single JSON file is written
into the container/venv via a heredoc and imported in one CLI invocation,
reducing O(n) exec calls to O(1) per object type.
Key changes:
- Variables: marshal to `{"name":"value",...}` JSON, import in one call
- Connections: marshal to `{"conn_id":{...}}` JSON (or URI string),
import with `--overwrite` (eliminates the list+delete+add cycle)
- Pools: marshal to `{"pool_name":{"slots":N,"description":"..."}}` JSON,
import in one call
- Airflow 1 codepaths are preserved unchanged as legacy fallbacks
- Exit code from the import command is captured and propagated even
after temp file cleanup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Coverage Report for CI Build 78Coverage increased (+0.02%) to 39.362%Details
Uncovered Changes
Coverage Regressions23 previously-covered lines in 4 files lost coverage.
Coverage Stats
💛 - Coveralls |
settings/settings.go
Outdated
| func AddVariables(id string, version uint64) error { | ||
| variables := settings.Airflow.Variables | ||
|
|
||
| if version >= AirflowVersionTwo { |
There was a problem hiding this comment.
We've long moved on from Airflow 1 so can we use this refactoring opportunity to rip all that out?
There was a problem hiding this comment.
done, ripped out all the airflow 1 codepaths - removed addVariablesLegacy, addConnectionsLegacy, addPoolsLegacy, prepareAirflowConnectionAddCommand, and the version parameter from all the Add*/ConfigSettings signatures + callers
settings/settings.go
Outdated
|
|
||
| // connectionImportObject builds the JSON value for a single connection in the | ||
| // airflow connections import format. Returns nil if the connection should be skipped. | ||
| func connectionImportObject(conn *Connection) interface{} { |
There was a problem hiding this comment.
iirc a lot of the below translation logic already exists in the CLI?
There was a problem hiding this comment.
yeah, connectionImportObject does similar field-by-field mapping as the old prepareAirflowConnectionAddCommand, but the output format is different enough that i didn't see a clean way to share it - the old code built CLI arg strings while this builds a json dict for airflow connections import. the old code is now deleted anyway with the airflow 1 removal so it's no longer duplicated
Per review feedback, removes all Airflow 1 legacy code since it's no longer supported: - Remove `addVariablesLegacy`, `addConnectionsLegacy`, `addPoolsLegacy` - Remove `prepareAirflowConnectionAddCommand` (only used by legacy path) - Remove `version` parameter from `AddVariables`, `AddConnections`, `AddPools`, and `ConfigSettings` - Update all callers in docker.go, standalone.go, and tests - Remove Airflow 1 unit tests Also fixes staticcheck SA4023 lint error (nil comparison on interface that never returns nil). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
settings/settings.go
Outdated
| // buildBatchImportCommand builds a compound shell command that writes jsonContent | ||
| // to tmpFile via heredoc, runs importCmd against that file, then cleans up. | ||
| func buildBatchImportCommand(tmpFile, importCmd, jsonContent string) string { | ||
| return fmt.Sprintf("cat > %s <<'__ASTRO_CLI_EOF__'\n%s\n__ASTRO_CLI_EOF__\n%s %s; _ret=$?; rm -f %s; exit $_ret", |
There was a problem hiding this comment.
It seems like this will get in the way of us trying to support standalone mode for Windows. Is it possible to do it just with Airflow API calls?
Since ImportSettings no longer needs the Airflow version (settings import is version-agnostic now), the checkAirflowVersion call was removed. Update the "list labels import error" test to test a Ps failure instead, and clean up unused imageHandler mocks from import tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the heredoc/bash approach with direct HTTP calls to the Airflow REST API. This is cross-platform (no bash dependency), faster (no CLI startup overhead), and works identically for Docker and standalone modes. For each variable, connection, and pool: POST to create, if 409 (already exists) then PATCH to update. This matches the previous overwrite behavior. Key changes: - ConfigSettings now takes airflowURL + authHeader instead of container ID - AddVariables/AddConnections/AddPools make HTTP calls to the REST API - Airflow 2: Basic Auth (admin:admin) with /api/v1 endpoints - Airflow 3: JWT auth via /auth/token with /api/v2 endpoints - URI-only connections are parsed into individual fields (conn_type, host, login, password, port, schema) since the REST API doesn't accept a uri field directly - printStatus/printProxyStatus no longer need container ID or Ps calls - Tests use httptest.NewServer to mock the Airflow API - Removed poolImportEntry type (no longer needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Handle default_pool specially: Airflow only allows updating slots and include_deferred on the default pool, so use update_mask query param when PATCHing (matches existing airflow-client behavior) - Parse URI-only connections into individual fields since the REST API doesn't accept a uri field directly - Add JWT auth for Airflow 3: fetch token from /auth/token endpoint (SimpleAuthManager with ALL_ADMINS=True accepts any credentials) - Add include_deferred field to pool creation payload (required by Airflow 3 API) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- airflowAPIURL and airflowAuthHeader now accept *PortOverrides to use the correct port when proxy mode allocates random ports - printProxyStatus passes its portOvr through to the API URL/auth helpers - Restore include_deferred in pool JSON body — required by Airflow 3's PATCH endpoint, safely ignored by Airflow 2's Marshmallow schemas Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docker mode: Query the running container's published port via getWebServerPublishedPort() so ImportSettings works even when proxy mode allocated a random port (not the config default). Standalone mode: - Persist the allocated port to .astro/standalone/port when starting - ImportSettings reads the persisted port to know where to connect - Both applySettings and ImportSettings now fetch a JWT token via /auth/token for Airflow 3's SimpleAuthManager (empty auth doesn't work) - Clean up the port file on stop Refactored fetchLocalAirflowToken into fetchAirflowJWTToken which takes a base URL so it can be reused by both docker and standalone. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
astro dev startandastro dev object importwere very slow because they loaded each variable, connection, and pool fromairflow_settings.yamlby shelling into the Airflow container and running a separateairflow variables set/airflow connections add/airflow pools setcommand per item. Every command paid the full Airflow CLI startup cost (~2-5 seconds), so a project with 10 connections, 5 variables, and 2 pools would spend 1-2+ minutes just on settings injection.This PR replaces the per-item CLI approach with direct calls to the Airflow REST API. For each object we POST to create, and if the object already exists (409 Conflict) we PATCH to update — matching the previous overwrite behavior without needing to list + delete + recreate.
What changes
Settings import (
settings/settings.go)ConfigSettings,AddVariables,AddConnections,AddPoolsnow take an Airflow API URL + optional auth header instead of a container ID/api/v1/{resource}(Airflow 2) or/api/v2/{resource}(Airflow 3)conn_uri: postgres://user:pass@host:5432/db) are parsed into individual fields since the REST API doesn't accept aurifielddefault_pooluses?update_mask=slots&update_mask=include_deferredon PATCH since Airflow only allows those two fields to be updated on the default poolhttptest.NewServerto mock the Airflow APICaller updates (
airflow/docker.go,airflow/standalone.go)http://localhost:<port>/api/v<version>and fetches auth — Basic Auth (admin:admin) for Airflow 2, JWT token via/auth/tokenfor Airflow 3's SimpleAuthManagercomposeService.Ps()soobject importfinds the right port.printStatus/printProxyStatusget the port from the overrides they already receive.astro/standalone/porton start and reads it back inobject import. Also fetches a JWT token the same way Docker mode doesPerformance impact
Previously: O(n) Airflow CLI invocations per object type, each ~2-5s overhead.
Now: O(n) HTTP requests with ~10-50ms overhead each. A typical settings file goes from >1 minute to a few seconds.
Test plan
Unit tests:
go test ./settings/... ./airflow/...passes (36 settings tests including new API mocking tests, airflow tests updated)golangci-lint run ./settings/... ./airflow/...cleanE2E tests (done manually with real Airflow containers/venvs):
dev start,dev object import,dev restart, verified every value round-trips through the REST APIdev start,dev object import,dev restart, verified via/api/v1with Basic Authdev start --standalone,dev object import --standalone,dev restart --standalone, verified JWT auth + port persistencep@ss'w0rd&special=chars!@#$%), JSON values in variables, Unicode, URI-only connections parsed correctly, invalid items skipped with existing warning messages, empty settings file, upsert on restartOut of scope
There's a pre-existing crash in
settings.ExportConnectionswhen running against Airflow 3 — it parses the YAML output ofairflow connections list -o yamlwhich has a different format in Airflow 3. Not related to this PR; filed separately.