Skip to content

FEAT: Propagate error messages from simulation methods to database#107

Open
mberz wants to merge 3 commits into
devfrom
feat/propagate_error_messages
Open

FEAT: Propagate error messages from simulation methods to database#107
mberz wants to merge 3 commits into
devfrom
feat/propagate_error_messages

Conversation

@mberz

@mberz mberz commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Proposed changes

  • Add support for different StatusCodes and error messages to the CloudExecutor
  • Handle and propagate errors in the simulation task
    • Explicitly raised errors by simulation methods are propagated with specific error message
    • Unexpected errors are propagated with generic error message
  • Set up models and schema for error messages.

Requires choras-org/simulation-backend#18 which introduces custom error classes for the simulation methods.

mberz added 3 commits June 16, 2026 13:29
- Except explicitly raised error from the simulation methods and
propagate them into the database
- Except unexpected errors and add them to the log, propagate general
error message ext
The model and schema for the error messages propagated from the
simulation method
@mberz mberz added the enhancement New feature or request label Jun 16, 2026
@mberz mberz requested a review from Copilot June 16, 2026 12:54
@mberz mberz moved this from Backlog to Implementation in progress in CHORAS planning Jun 16, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds end-to-end propagation of simulation error messages (including non-zero container exit codes and structured JSON errors) into persisted Simulation/SimulationRun records so the frontend can display meaningful failure reasons.

Changes:

  • Detect non-zero simulation container exit codes and attempt to extract structured error messages from the result JSON.
  • Persist errorMessage on Simulation and SimulationRun, and expose it via Marshmallow schemas.
  • Extend CloudExecutor’s completion stub to carry an exit code and return more informative logs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
app/services/simulation_service.py Adds exit-code checking, structured error extraction, and persists error messages on failures.
app/services/executors/cloud_executor.py Tracks simulation failure via JSON "error" and surfaces an exit code through _CompletedJob.
app/schemas/simulation_schema.py Exposes errorMessage in API responses for simulations and runs.
app/models/SimulationRun.py Adds errorMessage column to persist run-level failures.
app/models/Simulation.py Adds errorMessage column to persist simulation-level failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/services/simulation_service.py
Comment thread app/services/simulation_service.py
Comment thread app/schemas/simulation_schema.py
Comment thread app/schemas/simulation_schema.py
Comment on lines 447 to +455
def _poll_until_complete(
self,
remote_json_path: str,
local_uploads_dir: str,
remote_app_dir: str,
remote_sandbox_path: str,
remote_tar_path: Optional[str] = None,
) -> bool:
"""Adaptively poll the remote job until all results reach 100 % progress.
) -> tuple[bool, int]:
"""Adaptively poll the remote job until all results reach 100 % progress or an error occurs.
Comment on lines 564 to +571
success = self._collect_outputs_and_cleanup(
remote_app_dir = remote_app_dir,
local_uploads_dir = local_uploads_dir,
remote_sandbox_path = remote_sandbox_path,
remote_tar_path = remote_tar_path,
)

return success
return (success, 0) # Exit code 0 for success
Comment on lines +811 to +827
success, exit_code = self._poll_until_complete(
remote_json_path = f"{remote_app_dir_path}/{json_filename}",
local_uploads_dir = local_uploads_dir,
remote_app_dir = remote_app_dir_path,
remote_sandbox_path = f"{self.remote_work_dir}/{sandbox_name}",
remote_tar_path = f"{self.remote_work_dir}/{tar_image_name}",
)

return _CompletedJob()
# Build log message
if exit_code != 0:
logs_output = f"Cloud job completed with error (exit code {exit_code})"
elif not success:
logs_output = "Cloud job completed but cleanup failed"
else:
logs_output = "Cloud job completed successfully"

return _CompletedJob(exit_code=exit_code, logs_output=logs_output)
Comment on lines 482 to 486
Returns:
bool: ``True`` if outputs were collected and the remote workspace
was cleaned up successfully; ``False`` if an error occurred
during :meth:`_collect_outputs_and_cleanup`.
tuple[bool, int]: A tuple of (success, exit_code) where:
- success: ``True`` if outputs were collected and cleaned up successfully
- exit_code: 0 for success, 1 for error in simulation
"""
Comment on lines +550 to +554
except RuntimeError as ex:
# These are errors explicitly raised in the simulation-method
# including a meaningful error message.
# Propagate error messages to the database (frontend).
error_msg = str(ex)
Comment on lines 561 to +565
except Exception as ex:
# Unexpected errors - log full details but show generic message
import traceback

error_details = traceback.format_exc()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Implementation in progress

Development

Successfully merging this pull request may close these issues.

2 participants