A Browser Use discussion comment suggested that the most useful trace viewer is not only replaying a failed run, but comparing it with the last successful run and showing where the path diverged.
This came from Browser Use community feedback on the BrowserTrace discussion:
browser-use/browser-use#4816
First useful version
- keep one stable
run_id for each browser task
- store enough version/config metadata to compare runs fairly: Browser Use version, BrowserTrace version, model/provider, prompt/template version, and relevant adapter config
- compute a step-level comparison between a failed run and a selected successful run
- highlight the first divergent step by URL/title/action/target element summary/model output/error
- link retry or repair attempts back to the failed step they attempted to fix
- show a diffable final result summary when extracted/clicked/submitted/changed output is available
Suggested PR slices
- Metadata slice: record comparable run metadata without changing the UI. Include fixture coverage for missing or partial metadata.
- CLI prototype slice: add a JSON-first comparison command or helper that compares two selected runs and returns the first divergent step.
- Browser Use adapter slice: capture the Browser Use fields that make comparison useful, such as task text, model/provider, action summary, URL/title, extracted content, and error boundary when exposed by the callback or run-hook surface.
- UI slice: show a compact failed-vs-successful comparison view after the CLI/data shape is stable.
- Retry-link slice: link retry/repair attempts back to the failed step they attempted to fix.
A good first PR here should probably target slice 1 or a small part of slice 2. Please keep the first change narrow and include a short fixture or regression test.
Non-goals for the first version
- no hosted telemetry
- no automatic sharing
- no assumption that two runs are comparable unless their metadata makes that explicit
- no full DOM/session replay requirement for the first iteration
Useful feedback
If you have a real Browser Use failure where a previous successful run would have helped, please share which fields would make the divergence obvious. Public examples should avoid private trace data, tokens, cookies, customer data, and screenshots with sensitive content.
A Browser Use discussion comment suggested that the most useful trace viewer is not only replaying a failed run, but comparing it with the last successful run and showing where the path diverged.
This came from Browser Use community feedback on the BrowserTrace discussion:
browser-use/browser-use#4816
First useful version
run_idfor each browser taskSuggested PR slices
A good first PR here should probably target slice 1 or a small part of slice 2. Please keep the first change narrow and include a short fixture or regression test.
Non-goals for the first version
Useful feedback
If you have a real Browser Use failure where a previous successful run would have helped, please share which fields would make the divergence obvious. Public examples should avoid private trace data, tokens, cookies, customer data, and screenshots with sensitive content.