Feature: compare a failed Browser Use run with the last successful run

A Browser Use discussion comment suggested that the most useful trace viewer is not only replaying a failed run, but comparing it with the last successful run and showing where the path diverged.

This came from Browser Use community feedback on the BrowserTrace discussion:
https://github.com/browser-use/browser-use/discussions/4816

## First useful version

- keep one stable `run_id` for each browser task
- store enough version/config metadata to compare runs fairly: Browser Use version, BrowserTrace version, model/provider, prompt/template version, and relevant adapter config
- compute a step-level comparison between a failed run and a selected successful run
- highlight the first divergent step by URL/title/action/target element summary/model output/error
- link retry or repair attempts back to the failed step they attempted to fix
- show a diffable final result summary when extracted/clicked/submitted/changed output is available

## Suggested PR slices

1. Metadata slice: record comparable run metadata without changing the UI. Include fixture coverage for missing or partial metadata.
2. CLI prototype slice: add a JSON-first comparison command or helper that compares two selected runs and returns the first divergent step.
3. Browser Use adapter slice: capture the Browser Use fields that make comparison useful, such as task text, model/provider, action summary, URL/title, extracted content, and error boundary when exposed by the callback or run-hook surface.
4. UI slice: show a compact failed-vs-successful comparison view after the CLI/data shape is stable.
5. Retry-link slice: link retry/repair attempts back to the failed step they attempted to fix.

A good first PR here should probably target slice 1 or a small part of slice 2. Please keep the first change narrow and include a short fixture or regression test.

## Non-goals for the first version

- no hosted telemetry
- no automatic sharing
- no assumption that two runs are comparable unless their metadata makes that explicit
- no full DOM/session replay requirement for the first iteration

## Useful feedback

If you have a real Browser Use failure where a previous successful run would have helped, please share which fields would make the divergence obvious. Public examples should avoid private trace data, tokens, cookies, customer data, and screenshots with sensitive content.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: compare a failed Browser Use run with the last successful run #369

First useful version

Suggested PR slices

Non-goals for the first version

Useful feedback

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: compare a failed Browser Use run with the last successful run #369

Description

First useful version

Suggested PR slices

Non-goals for the first version

Useful feedback

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions