-
Notifications
You must be signed in to change notification settings - Fork 191
AIR CLI Integration: air run end to end command
#5710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
226d41a
experimental/air: add run config schema and structural validation
riddhibhagwat-db f757f6b
experimental/air: add run config launch accessors
riddhibhagwat-db 2e7cf85
experimental/air: wire run command for load, validate, dry-run
riddhibhagwat-db f46d5e0
experimental/air: add run pre-submit resolution helpers
riddhibhagwat-db 185f533
experimental/air: upload run launch artifacts
riddhibhagwat-db a5d851b
experimental/air: assemble and submit a training run
riddhibhagwat-db c992624
experimental/air: add run config schema and structural validation
riddhibhagwat-db a687311
Merge branch 'air-integration-m2-2' into air-integration-m2-3
riddhibhagwat-db 55e9619
experimental/air: fix run submission edge cases
riddhibhagwat-db 08d1e3d
Merge branch 'air-cli' into air-integration-m2-3
riddhibhagwat-db File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| experiment_name: bad.name | ||
| command: x | ||
| compute: | ||
| accelerator_type: GPU_8xH100 | ||
| num_accelerators: 3 |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
|
|
||
| === dry-run (text) | ||
| >>> [CLI] experimental air run -f valid.yaml --dry-run | ||
| Dry run: configuration for "smoke-test" is valid; not submitting. | ||
|
|
||
| === dry-run (json) | ||
| >>> [CLI] experimental air run -f valid.yaml --dry-run -o json | ||
| { | ||
| "v": 1, | ||
| "ts": "[TIMESTAMP]", | ||
| "data": { | ||
| "status": "DRY_RUN_OK", | ||
| "dry_run": true | ||
| } | ||
| } | ||
|
|
||
| === override not yet supported | ||
| >>> [CLI] experimental air run -f valid.yaml --dry-run --override a=b | ||
| Error: --override is not yet supported | ||
|
|
||
| Exit code: 1 | ||
|
|
||
| === watch not yet supported | ||
| >>> [CLI] experimental air run -f valid.yaml --dry-run --watch | ||
| Error: --watch is not yet supported | ||
|
|
||
| Exit code: 1 | ||
|
|
||
| === invalid config is rejected | ||
| >>> [CLI] experimental air run -f invalid.yaml --dry-run | ||
| Error: invalid experiment_name "bad.name": only alphanumeric characters, hyphens (-), and underscores (_) are allowed | ||
|
|
||
| Exit code: 1 | ||
|
|
||
| === missing --file | ||
| >>> [CLI] experimental air run --dry-run | ||
| Error: required flag(s) "file" not set | ||
|
|
||
| Exit code: 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| title "dry-run (text)" | ||
| trace $CLI experimental air run -f valid.yaml --dry-run | ||
|
|
||
| title "dry-run (json)" | ||
| trace $CLI experimental air run -f valid.yaml --dry-run -o json | ||
|
|
||
| title "override not yet supported" | ||
| errcode trace $CLI experimental air run -f valid.yaml --dry-run --override a=b | ||
|
|
||
| title "watch not yet supported" | ||
| errcode trace $CLI experimental air run -f valid.yaml --dry-run --watch | ||
|
|
||
| title "invalid config is rejected" | ||
| errcode trace $CLI experimental air run -f invalid.yaml --dry-run | ||
|
|
||
| title "missing --file" | ||
| errcode trace $CLI experimental air run --dry-run |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| # `air run --dry-run` validates the config locally and makes no workspace calls, | ||
| # so no engine matrix or server stubs are needed. | ||
| [EnvMatrix] | ||
| DATABRICKS_BUNDLE_ENGINE = [] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| experiment_name: smoke-test | ||
| command: python train.py | ||
| compute: | ||
| accelerator_type: GPU_1xH100 | ||
| num_accelerators: 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| package aircmd | ||
|
|
||
| // This file flattens the validated runConfig schema into the derived values the | ||
| // launch path consumes, replacing the Python CLI's _convert_to_run_config step. | ||
| // There is no separate internal config type: handle_run reads runConfig directly, | ||
| // using these accessors for the values that need computing rather than a plain | ||
| // field read. | ||
|
|
||
| const defaultMaxRetries = 3 | ||
|
|
||
| // timeoutSeconds converts timeout_minutes to seconds. Zero means the user set no | ||
| // timeout and the backend default applies. | ||
| func (c *runConfig) timeoutSeconds() int { | ||
| if c.TimeoutMinutes == nil { | ||
| return 0 | ||
| } | ||
| return *c.TimeoutMinutes * 60 | ||
| } | ||
|
|
||
| // maxRetries returns the retry count, applying the schema default when unset. | ||
| func (c *runConfig) maxRetries() int { | ||
| if c.MaxRetries == nil { | ||
| return defaultMaxRetries | ||
| } | ||
| return *c.MaxRetries | ||
| } | ||
|
|
||
| // dockerImageURL returns the custom docker image URL, or "" when none is set. | ||
| // | ||
| // TODO: not wired into submission yet — the native ai_runtime_task carries no | ||
| // docker field, and full support needs image registration (pending the DCS work). | ||
| func (c *runConfig) dockerImageURL() string { | ||
| if c.Environment != nil && c.Environment.DockerImage != nil { | ||
| return c.Environment.DockerImage.URL | ||
| } | ||
| return "" | ||
| } | ||
|
|
||
| // requirementsFile returns the path to a requirements file when | ||
| // environment.dependencies is a string, and whether it was set. | ||
| func (c *runConfig) requirementsFile() (string, bool) { | ||
| if c.Environment == nil || !c.Environment.Dependencies.set || c.Environment.Dependencies.isList { | ||
| return "", false | ||
| } | ||
| return c.Environment.Dependencies.path, true | ||
| } | ||
|
|
||
| // inlineDependencies returns the inline package list when | ||
| // environment.dependencies is a list, and whether it was set. | ||
| func (c *runConfig) inlineDependencies() ([]string, bool) { | ||
| if c.Environment == nil || !c.Environment.Dependencies.set || !c.Environment.Dependencies.isList { | ||
| return nil, false | ||
| } | ||
| return c.Environment.Dependencies.list, true | ||
| } | ||
|
|
||
| // runtimeVersion returns the client image version from environment.version when | ||
| // set. For a requirements-file dependency set, the version lives in that file and | ||
| // is resolved at launch, not here. | ||
| func (c *runConfig) runtimeVersion() (string, bool) { | ||
| if c.Environment == nil || !c.Environment.Version.set { | ||
| return "", false | ||
| } | ||
| return c.Environment.Version.raw, true | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| package aircmd | ||
|
|
||
| import ( | ||
| "testing" | ||
|
|
||
| "github.com/stretchr/testify/assert" | ||
| ) | ||
|
|
||
| func TestRunConfigTimeoutSeconds(t *testing.T) { | ||
| c := &runConfig{} | ||
| assert.Equal(t, 0, c.timeoutSeconds()) | ||
|
|
||
| c.TimeoutMinutes = new(2) | ||
| assert.Equal(t, 120, c.timeoutSeconds()) | ||
| } | ||
|
|
||
| func TestRunConfigMaxRetries(t *testing.T) { | ||
| c := &runConfig{} | ||
| assert.Equal(t, defaultMaxRetries, c.maxRetries()) | ||
|
|
||
| c.MaxRetries = new(0) | ||
| assert.Equal(t, 0, c.maxRetries()) | ||
|
|
||
| c.MaxRetries = new(7) | ||
| assert.Equal(t, 7, c.maxRetries()) | ||
| } | ||
|
|
||
| func TestRunConfigDockerImageURL(t *testing.T) { | ||
| c := &runConfig{} | ||
| assert.Empty(t, c.dockerImageURL()) | ||
|
|
||
| c.Environment = &environmentConfig{} | ||
| assert.Empty(t, c.dockerImageURL()) | ||
|
|
||
| c.Environment.DockerImage = &dockerImageConfig{URL: "org/repo:tag"} | ||
| assert.Equal(t, "org/repo:tag", c.dockerImageURL()) | ||
| } | ||
|
|
||
| func TestRunConfigDependencies(t *testing.T) { | ||
| t.Run("unset", func(t *testing.T) { | ||
| c := &runConfig{} | ||
| _, ok := c.requirementsFile() | ||
| assert.False(t, ok) | ||
| _, ok = c.inlineDependencies() | ||
| assert.False(t, ok) | ||
| }) | ||
|
|
||
| t.Run("file path", func(t *testing.T) { | ||
| c := &runConfig{Environment: &environmentConfig{ | ||
| Dependencies: dependencies{set: true, isList: false, path: "req.yaml"}, | ||
| }} | ||
| path, ok := c.requirementsFile() | ||
| assert.True(t, ok) | ||
| assert.Equal(t, "req.yaml", path) | ||
| _, ok = c.inlineDependencies() | ||
| assert.False(t, ok) | ||
| }) | ||
|
|
||
| t.Run("inline list", func(t *testing.T) { | ||
| c := &runConfig{Environment: &environmentConfig{ | ||
| Dependencies: dependencies{set: true, isList: true, list: []string{"torch", "numpy"}}, | ||
| }} | ||
| list, ok := c.inlineDependencies() | ||
| assert.True(t, ok) | ||
| assert.Equal(t, []string{"torch", "numpy"}, list) | ||
| _, ok = c.requirementsFile() | ||
| assert.False(t, ok) | ||
| }) | ||
| } | ||
|
|
||
| func TestRunConfigRuntimeVersion(t *testing.T) { | ||
| c := &runConfig{} | ||
| _, ok := c.runtimeVersion() | ||
| assert.False(t, ok) | ||
|
|
||
| c.Environment = &environmentConfig{Version: stringOrInt{set: true, raw: "5"}} | ||
| v, ok := c.runtimeVersion() | ||
| assert.True(t, ok) | ||
| assert.Equal(t, "5", v) | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should make a #TODO comment + maybe a jira ticket to track that you aren't using this function and plan on adding it later.