Skip to content

Resume with --web shows empty dashboard: prior agent context is not visible after checkpoint resume #167

@jrob5756

Description

@jrob5756

Summary

When resuming a workflow with the web dashboard (conductor resume <workflow.yaml> --web or --web-bg), the dashboard starts fresh: the timeline / agent panels / activity stream show nothing about agents that completed before the checkpoint. The user effectively loses all visual context of what the workflow already did.

The execution itself is correct — WorkflowContext (agent outputs, execution history) is restored in the engine and downstream agents do receive prior outputs — but the dashboard UI is blank for everything that ran before the checkpoint.

Reproduction

  1. conductor run examples/some-workflow.yaml --web and let several agents complete.
  2. Trigger a failure (or stop mid-run) so a checkpoint is saved.
  3. conductor resume examples/some-workflow.yaml --web.
  4. Open the dashboard.

Expected

The dashboard should show prior completed agents (status, outputs, timestamps, messages) so the user has the full visual context of the run, with execution continuing live from the resumed agent.

Actual

The dashboard shows the static workflow graph, but every node/panel for previously completed agents is empty. Only events from the resumed agent forward appear.

Root cause

WebDashboard accumulates its state purely from live events on the WorkflowEventEmitter (see src/conductor/web/server.py):

self._event_history: list[dict[str, Any]] = []
...
self._emitter.subscribe(self._on_event)

/api/state and the late-joiner WebSocket replay both serve from self._event_history. When the dashboard is started during resume_workflow_async (src/conductor/cli/run.py), no historical events are ever fed into it.

The behaviour is acknowledged in AGENTS.md:

Note: on resume, the dashboard only shows events from the resumed agent forward — events from agents that completed before the checkpoint were emitted in the original process and are not replayed.

…and in the docstring of resume_workflow_async:

the dashboard only shows events from the resumed agent forward; agent runs that completed before the checkpoint are not replayed.

The checkpoint does fully preserve workflow state (context.agent_outputs, context.execution_history, copilot_session_ids) but does not record the original run_id or JSONL event log path, so today there isn't even a way to find the original event log to replay from.

Suggested fix

Two reasonable options, not mutually exclusive:

Option A — replay events from the original JSONL log (preferred when available)

  1. Add event_log_path (and run_id) to CheckpointData / save_checkpoint so the checkpoint knows where the original *.events.jsonl lives. (The path is already available via EventLogSubscriber.path at the time the checkpoint is written.)
  2. On resume, before subscribing the dashboard to the emitter, read the JSONL line-by-line and call dashboard._on_event (or expose a replay_events() method) for each event. New live events from the resumed run are appended after the historical ones.
  3. Keep the workflow-level run_id stable across resumes so timeline / log-correlation tools see one continuous run rather than a fresh one.

This gives the user the full original timeline (messages, tool calls, reasoning, etc.) — not just status.

Option B — synthesize summary events from the restored context (fallback)

If no event log file is available (older checkpoints, log file deleted, etc.), synthesize minimal events from restored_context.execution_history + restored_context.agent_outputs so each prior agent at least appears as completed in the dashboard with its final output, even without the intermediate streaming events.

Implementation sketch:

# In resume_workflow_async, after creating the dashboard but before
# the engine starts emitting new events:
if dashboard is not None:
    if cp.event_log_path and Path(cp.event_log_path).exists():
        dashboard.replay_events_from_jsonl(Path(cp.event_log_path))
    else:
        dashboard.replay_synthetic_from_context(restored_context, config)

replay_events_from_jsonl would just iterate JSON lines and call the same _on_event handler the emitter uses today, so every existing rendering path on the frontend works unchanged (history is already broadcast to late-joining WebSocket clients via /api/state).

Additional polish

  • Drop a banner in the dashboard header indicating "Resumed from checkpoint at " so users understand why earlier events have a gap or differ in style.
  • Make the same data available on the run side too — --web-bg clients reconnecting to a still-running process already get history via /api/state; this brings parity for the resume case.

Workaround

For now, users can open the original JSONL event log directly (look in $TMPDIR/conductor/conductor-<workflow>-<timestamp>.events.jsonl) but there is no automatic correlation between a checkpoint and its source log file.

Related

  • src/conductor/cli/run.pyresume_workflow_async (dashboard init)
  • src/conductor/web/server.pyWebDashboard._event_history / /api/state
  • src/conductor/engine/checkpoint.pyCheckpointData schema
  • src/conductor/engine/event_log.pyEventLogSubscriber (source of replayable events)
  • AGENTS.md — "Run / Resume Parity" section (last bullet documents the gap)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions