Skip to content

Add event-level comparison tool for productions#2181

Open
GernotMaier wants to merge 32 commits into
mainfrom
event-level-comparision
Open

Add event-level comparison tool for productions#2181
GernotMaier wants to merge 32 commits into
mainfrom
event-level-comparision

Conversation

@GernotMaier
Copy link
Copy Markdown
Contributor

@GernotMaier GernotMaier commented May 11, 2026

Add a new application to compare multiple simulation productions on event level using reduced event-data inputs. Calculate also KS/Chi2 statistics to compare distributions.

This adds a lot of plots - attached some examples (gammas at 300 GeV)

Plotting adds a lot of code. Note sure how to make this more maintainable.

@GernotMaier GernotMaier self-assigned this May 11, 2026
@GernotMaier GernotMaier requested a review from Copilot May 13, 2026 10:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new CLI application to compare multiple simulation productions at the event level using reduced event-data inputs, plus supporting metric-collection utilities, plotting routines, documentation, and tests.

Changes:

  • Introduce simtools-compare-productions application that parses production descriptors and dispatches to an event-level comparison implementation.
  • Add event-level metric aggregation (production_comparison) and a new visualization module generating multi-production comparison plots.
  • Add unit/integration tests and documentation entries for the new application and modules.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/simtools/applications/compare_productions.py New CLI entry point for production comparisons; currently implements the events comparison level.
src/simtools/sim_events/production_comparison.py New utilities to parse production inputs and aggregate event-level metrics (including per-telescope-type subsets).
src/simtools/visualization/plot_event_level_production_comparison.py New plotting module producing event-level comparison figures across productions and per telescope type.
tests/unit_tests/applications/test_compare_productions_on_event_level.py Unit test for application orchestration from parsed args → metrics collection → plotting call.
tests/unit_tests/sim_events/test_production_comparison.py Unit tests for production argument parsing and metric aggregation.
tests/unit_tests/visualization/test_plot_event_level_production_comparison.py Unit tests covering plot outputs and several helper/branch behaviors.
tests/integration_tests/config/compare_productions_on_event_level_run.yml Integration workflow config to run the new application and assert selected plot outputs exist.
pyproject.toml Registers simtools-compare-productions console script entry point.
docs/source/user-guide/applications/simtools-compare-productions.rst User-guide page for the new application (autodoc).
docs/source/user-guide/applications.md Adds the new application to the user-guide toctree.
docs/source/api-reference/sim_events.md Adds API reference entry for production_comparison.
docs/source/api-reference/visualization.md Adds API reference entry for plot_event_level_production_comparison.
docs/source/conf.py Extends nitpick ignore list for collections.Counter references in autodoc.
docs/changes/2181.feature.md Changelog fragment for the new feature.
Comments suppressed due to low confidence (1)

src/simtools/visualization/plot_event_level_production_comparison.py:639

  • _plot_triggered_vs_quantity() is effectively unreachable in normal runs because _TRIGGERED_FRACTION_QUANTITIES is initialized as an empty set. If triggered-fraction-vs-quantity plots are intended outputs, populate this set (e.g., with the supported quantity names) and update tests/integration expectations; otherwise remove the dead branch and the test that relies on patching this constant.
_HISTOGRAM_STYLE_QUANTITIES = {"energy"}
_TRIGGERED_FRACTION_QUANTITIES = set()
_SPECIAL_TRIGGER_SUBSETS = {"single_telescope", "mixed_type"}

Comment thread src/simtools/visualization/plot_event_level_production_comparison.py Outdated
Comment thread docs/source/user-guide/applications.md Outdated
Comment thread tests/unit_tests/applications/test_compare_productions.py
Comment thread tests/integration_tests/config/compare_productions_run.yml
@orelgueta
Copy link
Copy Markdown
Contributor

@orelgueta look at the plots attached here and check what is missing or redundant.

@GernotMaier GernotMaier marked this pull request as ready for review May 18, 2026 09:55
@GernotMaier GernotMaier requested a review from tobiaskleiner May 18, 2026 14:14
Copy link
Copy Markdown
Collaborator

@tobiaskleiner tobiaskleiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GernotMaier, I have added a few comments.

if len(patterns) == 0:
raise ValueError(f"Production '{label}' has no event_data_file pattern.")

resolved_files = [str(path) for path in resolve_file_patterns(patterns)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a lot of (untested) code here in this application. Consider moving it to modules and add tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Fixed.

@property
def trigger_fraction(self):
"""Return triggered/simulated fraction."""
if self.simulated_event_count <= 0:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can go to negative?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. But the difference of testing for zeros and <=0 is identical?

Comment thread src/simtools/statistics.py Outdated
total2 = float(np.sum(counts2))
if total1 <= 0 or total2 <= 0:
return None, None, False, "insufficient_data"
support_mask = counts1 > 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that this is intended. You drop all bins later for comparison where the baseline histogram is 0, which I think skews the chi^2. Maybe better to merge empty bins in this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point! I've spent now almost two hours on it:

  • merging of empty bins introduce a lot of complicated code and makes things very hard to understand
  • considered replacing zero bins by almost-zero bins. But that would mess up the Chi2 statistics.

My conclusion was now to completely remove the Chi2 statistics as it is hard to interpret in these cases. We will have to discuss which methods are best for comparison and I've opened an issue on this: #2192

@ctao-sonarqube
Copy link
Copy Markdown

@GernotMaier
Copy link
Copy Markdown
Contributor Author

@tobiaskleiner - thanks a lot for the review! You are absolutely right with the Chi2 tests and that needs much more thought. I pushed it therefore into an issue. Let me know if you are fine with this or further changes are required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Event-level comparison tools for long-integration tests Long integration tests for array triggers

4 participants