Skip to content

batch_pipeline.py does not convert HDF5→MS for hourly tiles; conversion is implicit only for the cal MS #71

@jakobtfaber

Description

@jakobtfaber

Problem

scripts/batch_pipeline.py discovers tile MSs by globbing existing YYYY-MM-DDT*.ms under MS_DIR. It does not invoke HDF5→MS conversion for missing hourly tiles. Conversion is only triggered as a side-effect inside ensure_bandpass() for the calibrator MS specifically.

Result: if a date+hour has incomplete pre-converted tile MSs on disk, the orchestrator silently runs on a partial-tile set (or zero tiles) instead of converting from the available HDF5 inventory. This is a hidden coupling between "pipeline run" and "an unspecified prior conversion step."

Discovered while

Codex read-only review of the 3C138 fundamentals-mosaic plan on 2026-05-05. Concrete instance: hour 04 of 2026-01-25 has 6 MSs covering 04:03–04:29; the 3C138 transit at 04:57:50 falls in an unconverted gap (HDF5 files exist for the gap timestamps; nothing has converted them).

See: outputs/3c138_smoke_2026-05-05/disk_ms_inventory.md, _codex_review.log.

Acceptance

Two acceptable resolutions:

(a) Document and CI-gate the prerequisite. Add a help-text note that batch_pipeline.py requires pre-converted hourly tile MSs; add a pre-flight check that fails loudly if the requested --start-hour..--end-hour window is incomplete relative to the indexed HDF5 inventory. Existing dsa110 convert workflow remains the producer.

(b) Auto-convert. Have batch_pipeline.py call the conversion path before tile processing if MSs are missing for the requested window. More invasive; better UX.

Either resolution satisfies the issue. Pre-flight check (a) is cheaper and safer for production runs that should not silently expand scope.

Workaround currently in use

For the 3C138 demo, manually invoke dsa110 convert to fill the 04:30–05:00 gap of 2026-01-25 before running the orchestrator.

Out of scope

  • Changing the conversion algorithm itself.
  • Re-architecting how the cal MS is constructed inside ensure_bandpass.

References

  • scripts/batch_pipeline.py (tile globbing logic; precise line range in the codex review log).
  • dsa110_continuum/calibration/ensure.py ensure_bandpass() (cal-MS conversion side effect).
  • outputs/3c138_smoke_2026-05-05/disk_ms_inventory.md
  • Codex review finding Add AGENTS.md with Cursor Cloud development instructions #2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageA maintainer needs to evaluate the issue.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions