Change the collision count to be a Mean over the rollout axis instead… #277

WaelDLZ · 2026-02-03T17:07:59Z

… of a Sum, it makes way more sense now

greptile-apps · 2026-02-03T17:12:40Z

Greptile Overview

Greptile Summary

Changed collision and offroad count metrics from sum to mean across rollouts, providing normalized collision rates instead of total counts. This makes the metrics more meaningful for comparison across different numbers of rollouts.

Key Changes:

evaluator.py: Replaced np.sum() with np.mean() for collision and offroad calculations over rollout axis (axis=1)
visual_sanity_check.py: Added missing load_model_path configuration and removed reference to non-existent wosac_num_agents config key

Impact:

Collision/offroad metrics now return values in range [0.0, 1.0] representing average collision rate per agent across rollouts
Column names (num_collisions_sim, num_collisions_ref) still suggest integer counts but now contain float rates
Fixes KeyError in visual_sanity_check.py that would have occurred when accessing config["eval"]["wosac_num_agents"]

Confidence Score: 4/5

This PR is safe to merge with minor considerations for naming consistency
The core logic change (sum to mean) is mathematically sound and semantically more meaningful. The visual_sanity_check.py fixes resolve actual bugs (non-existent config key). Minor concerns: (1) column names don't reflect the change from counts to rates, (2) hardcoded model path reduces flexibility but doesn't break functionality.
No files require special attention - changes are straightforward and improve metric calculation

Important Files Changed

Filename	Overview
pufferlib/ocean/benchmark/evaluator.py	Changed collision and offroad metrics from sum to mean over rollout axis for normalized comparison
pufferlib/ocean/benchmark/visual_sanity_check.py	Added hardcoded model path and removed non-existent config key reference

Sequence Diagram

sequenceDiagram
    participant Caller
    participant WOSACEvaluator
    participant Metrics
    participant DataFrameOutput

    Caller->>WOSACEvaluator: compute_metrics(trajectories)
    WOSACEvaluator->>Metrics: compute_interaction_features()
    Metrics-->>WOSACEvaluator: collision_per_step (n_agents, n_rollouts, n_steps)
    
    Note over WOSACEvaluator: Aggregate collisions over time
    WOSACEvaluator->>WOSACEvaluator: np.any(collision_per_step, axis=2)
    Note over WOSACEvaluator: Result: collision_indication (n_agents, n_rollouts)
    
    Note over WOSACEvaluator: OLD: Sum across rollouts<br/>NEW: Mean across rollouts
    WOSACEvaluator->>WOSACEvaluator: np.mean(collision_indication, axis=1)
    Note over WOSACEvaluator: Result: num_collisions (n_agents,)<br/>Now represents average collision rate<br/>instead of total count
    
    WOSACEvaluator->>DataFrameOutput: Create DataFrame with metrics
    Note over DataFrameOutput: num_collisions_sim and num_collisions_ref<br/>are now mean values (0.0-1.0)<br/>instead of integer counts

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

pufferlib/ocean/benchmark/visual_sanity_check.py

greptile-apps · 2026-02-03T17:12:50Z

Additional Comments (1)

pufferlib/ocean/benchmark/evaluator.py
Column names still say num_collisions but values are now mean rates (0.0-1.0) instead of counts. Consider renaming to collision_rate_sim/collision_rate_ref and offroad_rate_sim/offroad_rate_ref for clarity.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review.
Path: pufferlib/ocean/benchmark/evaluator.py
Line: 473:476

Comment:
Column names still say `num_collisions` but values are now mean rates (0.0-1.0) instead of counts. Consider renaming to `collision_rate_sim`/`collision_rate_ref` and `offroad_rate_sim`/`offroad_rate_ref` for clarity.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Change the collision count to be a Mean over the rollout axis instead…

3743243

… of a Sum, it makes way more sense now

WaelDLZ marked this pull request as ready for review February 3, 2026 17:08

WaelDLZ requested a review from daphne-cornelisse February 3, 2026 17:08

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

pufferlib/ocean/benchmark/visual_sanity_check.py Show resolved Hide resolved

daphne-cornelisse approved these changes Feb 3, 2026

View reviewed changes

daphne-cornelisse merged commit c9aaeb2 into 2.0 Feb 3, 2026
14 checks passed

daphne-cornelisse deleted the wbd/tiny_wosac_fix branch February 3, 2026 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the collision count to be a Mean over the rollout axis instead… #277

Change the collision count to be a Mean over the rollout axis instead… #277

WaelDLZ commented Feb 3, 2026

Uh oh!

greptile-apps bot commented Feb 3, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

greptile-apps bot commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Change the collision count to be a Mean over the rollout axis instead… #277

Change the collision count to be a Mean over the rollout axis instead… #277

Conversation

WaelDLZ commented Feb 3, 2026

Uh oh!

greptile-apps bot commented Feb 3, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants