Add event-level comparison tool for productions#2181
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new CLI application to compare multiple simulation productions at the event level using reduced event-data inputs, plus supporting metric-collection utilities, plotting routines, documentation, and tests.
Changes:
- Introduce
simtools-compare-productionsapplication that parses production descriptors and dispatches to an event-level comparison implementation. - Add event-level metric aggregation (
production_comparison) and a new visualization module generating multi-production comparison plots. - Add unit/integration tests and documentation entries for the new application and modules.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/simtools/applications/compare_productions.py |
New CLI entry point for production comparisons; currently implements the events comparison level. |
src/simtools/sim_events/production_comparison.py |
New utilities to parse production inputs and aggregate event-level metrics (including per-telescope-type subsets). |
src/simtools/visualization/plot_event_level_production_comparison.py |
New plotting module producing event-level comparison figures across productions and per telescope type. |
tests/unit_tests/applications/test_compare_productions_on_event_level.py |
Unit test for application orchestration from parsed args → metrics collection → plotting call. |
tests/unit_tests/sim_events/test_production_comparison.py |
Unit tests for production argument parsing and metric aggregation. |
tests/unit_tests/visualization/test_plot_event_level_production_comparison.py |
Unit tests covering plot outputs and several helper/branch behaviors. |
tests/integration_tests/config/compare_productions_on_event_level_run.yml |
Integration workflow config to run the new application and assert selected plot outputs exist. |
pyproject.toml |
Registers simtools-compare-productions console script entry point. |
docs/source/user-guide/applications/simtools-compare-productions.rst |
User-guide page for the new application (autodoc). |
docs/source/user-guide/applications.md |
Adds the new application to the user-guide toctree. |
docs/source/api-reference/sim_events.md |
Adds API reference entry for production_comparison. |
docs/source/api-reference/visualization.md |
Adds API reference entry for plot_event_level_production_comparison. |
docs/source/conf.py |
Extends nitpick ignore list for collections.Counter references in autodoc. |
docs/changes/2181.feature.md |
Changelog fragment for the new feature. |
Comments suppressed due to low confidence (1)
src/simtools/visualization/plot_event_level_production_comparison.py:639
_plot_triggered_vs_quantity()is effectively unreachable in normal runs because_TRIGGERED_FRACTION_QUANTITIESis initialized as an empty set. If triggered-fraction-vs-quantity plots are intended outputs, populate this set (e.g., with the supported quantity names) and update tests/integration expectations; otherwise remove the dead branch and the test that relies on patching this constant.
_HISTOGRAM_STYLE_QUANTITIES = {"energy"}
_TRIGGERED_FRACTION_QUANTITIES = set()
_SPECIAL_TRIGGER_SUBSETS = {"single_telescope", "mixed_type"}
|
@orelgueta look at the plots attached here and check what is missing or redundant. |
tobiaskleiner
left a comment
There was a problem hiding this comment.
Thanks @GernotMaier, I have added a few comments.
| if len(patterns) == 0: | ||
| raise ValueError(f"Production '{label}' has no event_data_file pattern.") | ||
|
|
||
| resolved_files = [str(path) for path in resolve_file_patterns(patterns)] |
There was a problem hiding this comment.
Quite a lot of (untested) code here in this application. Consider moving it to modules and add tests.
| @property | ||
| def trigger_fraction(self): | ||
| """Return triggered/simulated fraction.""" | ||
| if self.simulated_event_count <= 0: |
There was a problem hiding this comment.
it can go to negative?
There was a problem hiding this comment.
Probably not. But the difference of testing for zeros and <=0 is identical?
| total2 = float(np.sum(counts2)) | ||
| if total1 <= 0 or total2 <= 0: | ||
| return None, None, False, "insufficient_data" | ||
| support_mask = counts1 > 0 |
There was a problem hiding this comment.
Please check that this is intended. You drop all bins later for comparison where the baseline histogram is 0, which I think skews the chi^2. Maybe better to merge empty bins in this case?
There was a problem hiding this comment.
Very good point! I've spent now almost two hours on it:
- merging of empty bins introduce a lot of complicated code and makes things very hard to understand
- considered replacing zero bins by almost-zero bins. But that would mess up the Chi2 statistics.
My conclusion was now to completely remove the Chi2 statistics as it is hard to interpret in these cases. We will have to discuss which methods are best for comparison and I've opened an issue on this: #2192
|
|
@tobiaskleiner - thanks a lot for the review! You are absolutely right with the Chi2 tests and that needs much more thought. I pushed it therefore into an issue. Let me know if you are fine with this or further changes are required. |




Add a new application to compare multiple simulation productions on event level using reduced event-data inputs. Calculate also KS/Chi2 statistics to compare distributions.
This adds a lot of plots - attached some examples (gammas at 300 GeV)
Plotting adds a lot of code. Note sure how to make this more maintainable.