Refactor archive ingestion parsing and surface HPC provenance#142
Conversation
There was a problem hiding this comment.
Pull request overview
Refactors ingestion metadata parsing to derive simulation timeline fields from
CaseDocs/env_run.xml.* plus timing-derived run metadata, removes the
CaseStatus parser, and significantly expands tests for the NERSC archive
ingestor and parser utilities.
Changes:
- Replace
CaseStatus-based ingestion metadata withenv_run.xml.*+
e3sm_timing.*parsing, and add a run-artifact status helper. - Tighten ingestion required files (
env_run.xml.*,e3sm_timing.*, etc.) and
make parser metadata assembly more tolerant to malformed parser output. - Add/expand tests covering NERSC ingestor behavior, parser file requirements,
and parser utility helpers.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/app/features/ingestion/parsers/parser.py | Updates FILE_SPECS to require env_run + timing, removes CaseStatus wiring, and merges run-artifact status into parsed metadata. |
| backend/app/features/ingestion/parsers/case_docs.py | Expands env_case/env_build parsing and adds env_run parsing + artifact-derived status helper. |
| backend/app/features/ingestion/parsers/e3sm_timing.py | Narrows timing parsing to execution/run timing metadata and computes run start/end timestamps. |
| backend/app/features/ingestion/parsers/case_status.py | Removes CaseStatus parser implementation. |
| backend/tests/features/ingestion/test_nersc_archive_ingestor.py | Adds broad unit coverage for config parsing, state handling, request failures, logging, and __main__. |
| backend/tests/features/ingestion/parsers/test_parser.py | Updates parser integration tests to reflect required env_run/timing and removed CaseStatus. |
| backend/tests/features/ingestion/parsers/test_case_docs.py | Adds tests for new env_run parsing, campaign/experiment derivation, and run-artifact status helper. |
| backend/tests/features/ingestion/parsers/test_e3sm_timing.py | Reworks timing parser tests around execution_id and run start/end derivation behavior. |
| backend/tests/features/ingestion/parsers/test_case_status.py | Removes CaseStatus parser tests. |
| backend/tests/features/ingestion/parsers/test_utils.py | Adds tests for _open_text / _get_open_func gzip/plain handling. |
Comments suppressed due to low confidence (1)
backend/app/features/ingestion/parsers/parser.py:413
_parse_all_filessignature now takesexec_dir, but the docstring parameter list still only documentsfiles. Update the docstring to include/describeexec_dir.
def _parse_all_files(exec_dir: str, files: dict[str, str | None]) -> SimulationMetadata:
"""Pass discovered files to their respective parser functions.
Parameters
----------
files : dict[str, str | None]
Dictionary of file paths for each file type.
Returns
-------
SimulationMetadata
Dictionary with parsed results from each file type.
"""
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Refactors archive-ingestion metadata parsing to remove CaseStatus usage, derive
timeline metadata from env_run.xml.* + e3sm_timing.*, and expands test
coverage (notably for the NERSC archive ingestor).
Changes:
- Replace
CaseStatusparsing withenv_run.xml.*-derived simulation dates and
e3sm_timing.*-derived run metadata (start/end). - Introduce typed parser output (
ParsedSimulation) and normalize canonical
config delta comparison viaSimulationConfigSnapshot. - Expand ingestion-related test coverage and remove
case_statustests.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/src/types/simulation.ts | Adds hpcUsername + created/updated user preview typing. |
| frontend/src/features/simulations/components/SimulationDetailsView.tsx | Displays hpcUsername on the simulation details page. |
| frontend/src/features/browse/components/BrowseFiltersSidePanel.tsx | Adds creator label options + HPC username filter UI. |
| frontend/src/features/browse/BrowsePage.tsx | Adds hpcUsername filter state and creator display labels. |
| backend/app/features/ingestion/parsers/case_docs.py | Adds env_run.xml parsing + run-artifact status helper. |
| backend/app/features/ingestion/parsers/e3sm_timing.py | Narrows timing parsing to execution/run timing fields only. |
| backend/app/features/ingestion/parsers/parser.py | Removes case_status wiring; requires env_run + returns ParsedSimulation. |
| backend/app/features/ingestion/parsers/types.py | New typed parser output dataclass. |
| backend/app/features/ingestion/ingest.py | Consumes typed parser output; refactors canonical-delta calculation. |
| backend/app/features/simulation/config_delta.py | New normalized config snapshot + diff helper. |
| backend/app/features/simulation/models.py | Uses SimulationConfigSnapshot as the source of delta field names. |
| backend/app/features/simulation/schemas.py | Documentation updates; exposes hpc_username on SimulationOut. |
| backend/tests/features/ingestion/parsers/test_case_docs.py | Adds env_run + run-artifact helper coverage. |
| backend/tests/features/ingestion/parsers/test_e3sm_timing.py | Updates tests to new timing-derived run metadata behavior. |
| backend/tests/features/ingestion/parsers/test_parser.py | Updates integration tests for new required files + typed output. |
| backend/tests/features/ingestion/parsers/test_utils.py | Adds tests for text/gzip open helpers. |
| backend/tests/features/ingestion/parsers/test_case_status.py | Removes CaseStatus parser tests. |
| backend/app/features/ingestion/parsers/case_status.py | Removes CaseStatus parser implementation. |
| backend/tests/features/ingestion/test_ingest.py | Updates ingestion tests for typed parser output + snapshot/delta semantics. |
| backend/tests/features/ingestion/test_nersc_archive_ingestor.py | Expands unit coverage across config/state/retry/logging/main guard. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fb245caec2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Summary
This PR refactors archive ingestion parsing to use the correct source files for execution metadata, restores
CaseStatus-based run status parsing, centralizes canonical config snapshot comparison, and surfaces HPC provenance more clearly in API responses and the frontend.What changed
Backend ingestion and parsing
ParsedSimulationrecords instead of loose metadata dicts.env_run.xmla required CaseDocs input and use it as the source of:initialization_typesimulation_start_datesimulation_end_dateenv_case.xmlparsing to extract:case_namecase_groupmachineREALUSERas HPC usernamecompset_aliascampaign/experiment_typeenv_build.xmlparsing to also extractgrid_resolution.e3sm_timingparsing so it now focuses on:LIDasexecution_idrun_start_daterun_end_dateCaseStatusparsing as an optional artifact and use it to:completed/failed/running/unknowncase.runattemptLIDLIDdoes not match the directory name, so distinct runs are still preservedBackend ingest flow and canonical deltas
SimulationCreate.SimulationConfigSnapshotto centralize canonical delta comparison.execution_idis documented as coming from the timing-fileLID.hpc_usernametoSimulationOut.Frontend
createdByUser/lastUpdatedByUsersupport to frontend simulation types.createdByUser.emailwhen available.Tests
env_case,env_build,env_run,CaseStatus,e3sm_timing, parser merge behavior, incomplete-run skipping, and timingLIDmismatch handling.User-facing impact
Testing
uv run pytest tests/features/ingestion/parsers/test_case_docs.py tests/features/ingestion/parsers/test_case_status.py tests/features/ingestion/parsers/test_e3sm_timing.py tests/features/ingestion/parsers/test_parser.py tests/features/ingestion/test_ingest.py tests/features/ingestion/test_nersc_archive_ingestor.py173 passed