Skip to content

Conversation

@WaelDLZ
Copy link

@WaelDLZ WaelDLZ commented Feb 3, 2026

… of a Sum, it makes way more sense now

@WaelDLZ WaelDLZ marked this pull request as ready for review February 3, 2026 17:08
@greptile-apps
Copy link

greptile-apps bot commented Feb 3, 2026

Greptile Overview

Greptile Summary

Changed collision and offroad count metrics from sum to mean across rollouts, providing normalized collision rates instead of total counts. This makes the metrics more meaningful for comparison across different numbers of rollouts.

Key Changes:

  • evaluator.py: Replaced np.sum() with np.mean() for collision and offroad calculations over rollout axis (axis=1)
  • visual_sanity_check.py: Added missing load_model_path configuration and removed reference to non-existent wosac_num_agents config key

Impact:

  • Collision/offroad metrics now return values in range [0.0, 1.0] representing average collision rate per agent across rollouts
  • Column names (num_collisions_sim, num_collisions_ref) still suggest integer counts but now contain float rates
  • Fixes KeyError in visual_sanity_check.py that would have occurred when accessing config["eval"]["wosac_num_agents"]

Confidence Score: 4/5

  • This PR is safe to merge with minor considerations for naming consistency
  • The core logic change (sum to mean) is mathematically sound and semantically more meaningful. The visual_sanity_check.py fixes resolve actual bugs (non-existent config key). Minor concerns: (1) column names don't reflect the change from counts to rates, (2) hardcoded model path reduces flexibility but doesn't break functionality.
  • No files require special attention - changes are straightforward and improve metric calculation

Important Files Changed

Filename Overview
pufferlib/ocean/benchmark/evaluator.py Changed collision and offroad metrics from sum to mean over rollout axis for normalized comparison
pufferlib/ocean/benchmark/visual_sanity_check.py Added hardcoded model path and removed non-existent config key reference

Sequence Diagram

sequenceDiagram
    participant Caller
    participant WOSACEvaluator
    participant Metrics
    participant DataFrameOutput

    Caller->>WOSACEvaluator: compute_metrics(trajectories)
    WOSACEvaluator->>Metrics: compute_interaction_features()
    Metrics-->>WOSACEvaluator: collision_per_step (n_agents, n_rollouts, n_steps)
    
    Note over WOSACEvaluator: Aggregate collisions over time
    WOSACEvaluator->>WOSACEvaluator: np.any(collision_per_step, axis=2)
    Note over WOSACEvaluator: Result: collision_indication (n_agents, n_rollouts)
    
    Note over WOSACEvaluator: OLD: Sum across rollouts<br/>NEW: Mean across rollouts
    WOSACEvaluator->>WOSACEvaluator: np.mean(collision_indication, axis=1)
    Note over WOSACEvaluator: Result: num_collisions (n_agents,)<br/>Now represents average collision rate<br/>instead of total count
    
    WOSACEvaluator->>DataFrameOutput: Create DataFrame with metrics
    Note over DataFrameOutput: num_collisions_sim and num_collisions_ref<br/>are now mean values (0.0-1.0)<br/>instead of integer counts
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 3, 2026

Additional Comments (1)

pufferlib/ocean/benchmark/evaluator.py
Column names still say num_collisions but values are now mean rates (0.0-1.0) instead of counts. Consider renaming to collision_rate_sim/collision_rate_ref and offroad_rate_sim/offroad_rate_ref for clarity.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/benchmark/evaluator.py
Line: 473:476

Comment:
Column names still say `num_collisions` but values are now mean rates (0.0-1.0) instead of counts. Consider renaming to `collision_rate_sim`/`collision_rate_ref` and `offroad_rate_sim`/`offroad_rate_ref` for clarity.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@daphne-cornelisse daphne-cornelisse merged commit c9aaeb2 into 2.0 Feb 3, 2026
14 checks passed
@daphne-cornelisse daphne-cornelisse deleted the wbd/tiny_wosac_fix branch February 3, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants