Skip to content

feat: Propagate kubernetes launcher errors to API#137

Draft
morgan-wowk wants to merge 1 commit intomasterfrom
propagate-launcher-errors
Draft

feat: Propagate kubernetes launcher errors to API#137
morgan-wowk wants to merge 1 commit intomasterfrom
propagate-launcher-errors

Conversation

@morgan-wowk
Copy link
Collaborator

@morgan-wowk morgan-wowk commented Mar 4, 2026

TL;DR

Enhanced Kubernetes error handling to extract and display meaningful error messages from API exceptions instead of dumping raw pod/job specifications.

What changed?

Added specific handling for k8s_client_lib.ApiException in both pod and job creation methods within the Kubernetes launchers. The new error handling extracts human-readable messages from the Kubernetes API response body and stores orchestration error messages in execution extra data for better debugging visibility.

How to test?

  1. Trigger a Kubernetes pod or job creation that would fail due to admission webhook rejection or other API validation errors
  2. Verify that the error message displays the actual Kubernetes API error reason instead of a serialized pod/job specification
  3. Check that the orchestration error message is stored in the execution's extra data for system errors

Why make this change?

Kubernetes API returns structured JSON error responses with actionable error messages (like admission webhook rejection reasons), but the previous implementation would dump the entire pod/job specification instead. This change provides much more useful debugging information by surfacing the actual root cause of failures.

Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@morgan-wowk morgan-wowk self-assigned this Mar 4, 2026
bts.ContainerExecutionStatus.SYSTEM_ERROR
)
record_system_error_exception(execution=execution, exception=ex)
if execution.extra_data is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Why do you want to put this information in the another field? The error message (brief, full) is already put in extra_data.
P.S. There is _record_orchestration_error_message function that does slightly more.

except k8s_client_lib.ApiException as ex:
k8s_message = _extract_kubernetes_error_message(ex)
raise interfaces.LauncherError(
k8s_message or f"Failed to create pod: Kubernetes API error ({ex.status} {ex.reason})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include the Pod that we were trying to create (as the current message does)? I think it's very important for debugging pod creation issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants