This repository provides the main analysis framework for processing WR→Nℓ→ℓℓjj events using the Coffea columnar analysis toolkit. It handles background, data, and signal samples to produce histograms for downstream limit-setting and plotting.
For first-time setup (cloning, creating the venv, Condor environment), see Getting Started.
Activate the virtual environment before running any commands:
source .venv/bin/activateTip: Add this line to your
~/.bashrcto activate automatically on login:cd /path/to/WrCoffea && source .venv/bin/activate && cd -
- Quick Start – Run the analyzer
- Running on Condor – Scale out with HTCondor at LPC
- Skimming – Skim NanoAOD files for faster analysis
- Command Reference – Complete flag reference and examples
- Repository Structure – Overview of how the codebase is organized
- Testing – Running the automated test suite
- Additional Documentation – Links to detailed guides
Run the analyzer by specifying an era and composite mode:
python3 bin/run_analysis.py RunIII2024Summer24 all # everything, LO inclusive DY
python3 bin/run_analysis.py RunIII2024Summer24 all --dy nlo_inc # everything, NLO inclusive DY
python3 bin/run_analysis.py RunIII2024Summer24 mc # backgrounds + signal
python3 bin/run_analysis.py RunIII2024Summer24 bkg # backgrounds only
python3 bin/run_analysis.py RunIII2024Summer24 data # data only
python3 bin/run_analysis.py RunIII2024Summer24 signal # signal onlyComposite modes process multiple samples sequentially (locally) or in parallel (on Condor with --condor). You can also run individual samples directly:
| Mode | Samples |
|---|---|
all |
EGamma, Muon, DYJets, tt_tW, Nonprompt, Other, Signal |
data |
EGamma, Muon |
bkg |
DYJets, tt_tW, Nonprompt, Other |
signal |
Signal (default subset of mass points) |
mc |
DYJets, tt_tW, Nonprompt, Other, Signal |
| Single sample | DYJets, tt_tW, Nonprompt, Other, EGamma, Muon, Signal (with --mass) |
Output ROOT histograms are saved to WR_Plotter/rootfiles/<Run>/<Year>/<Era>/.
Note: Filesets for existing eras are already included in the repository. To create filesets for a new era, see filesets.md.
See Running the Analyzer for full details: all samples, output customization, region selection, systematics, and batch processing.
Analysis jobs can run for a long time. Use tmux to keep your session alive after disconnecting from the LPC node. Note which node you are on (hostname), since tmux sessions are local to that node — you must SSH back to the same node to reattach.
# Check and note your hostname (e.g., cmslpc320.fnal.gov)
hostname
# Start a new named session
tmux new -s analysis
# Run your jobs as usual
python bin/run_analysis.py RunIII2024Summer24 all --dir 20260217_skimmedYou can then detach from the session with Ctrl-b then d (press Ctrl-b, release, then press d) and safely log out. To reattach later, SSH to the same node:
ssh cmslpc320.fnal.gov # replace with your node
tmux attach -t analysisOther useful tmux commands:
tmux ls— list active sessionsCtrl-bthend— detach from current sessiontmux kill-session -t analysis— kill a session
Scale out processing across many workers at FNAL LPC using HTCondor with the Dask executor. Requires the lpcjobqueue Apptainer environment.
./shell coffeateam/coffea-dask-almalinux8:2025.12.0-py3.12 # enter container
python bin/run_analysis.py RunIII2024Summer24 all --condor # everything on CondorSee Running on Condor for full documentation: setup, worker/chunksize defaults, and log locations.
The skimmer applies a loose event preselection to NanoAOD files, reducing file sizes for faster analysis iteration. It uses bin/skim.py with subcommands for the full workflow: skim locally or on Condor, check for failures, and merge outputs.
python3 bin/skim.py --cuts # show skim cuts
python3 bin/skim.py run /TTto2L2Nu_.../NANOAODSIM # submit all to Condor
python3 bin/skim.py check /TTto2L2Nu_.../NANOAODSIM # check for failures
python3 bin/skim.py merge /TTto2L2Nu_.../NANOAODSIM # extract + hadd + validateSee Skimming for full documentation: selection cuts, all subcommand flags, output layout, and architecture.
| Flag | Arguments | Description |
|---|---|---|
era |
<era_name> |
Required positional. Campaign to analyze (e.g., RunIII2024Summer24) |
sample |
<sample_name> |
Required positional. Sample to analyze (e.g., DYJets, Signal, EGamma) |
--mass |
<mass_point> |
Signal mass point (e.g., WR4000_N2100). Required for Signal sample |
--region |
resolved|boosted|both |
Analysis region to run (default: both) |
--dy |
VARIANT |
DY sample variant (only valid for DYJets). Variants are per-era; see config.yaml |
--dir |
<directory> |
Create output subdirectory under rootfiles path |
--name |
<suffix> |
Append suffix to output ROOT filename |
--debug |
Run without saving histograms (for testing) | |
--reweight |
<json_file> |
Path to DY reweight JSON file (DYJets only) |
--unskimmed |
Use unskimmed filesets instead of default skimmed files | |
--condor |
Submit jobs to HTCondor at LPC (requires Apptainer shell, see Running on Condor) | |
--fileset |
<path> |
Override automatic fileset path with a custom fileset JSON |
--max-workers |
<int> |
Number of Dask workers (local default: 3, condor single-sample: 50, condor composite skimmed: 200, condor composite unskimmed: 500) |
--worker-wait-timeout |
<int> |
Seconds to wait for first Condor worker before failing (default: 1200) |
--chunksize |
<int> |
Number of events per processing chunk (default: 250000) |
--maxchunks |
<int> |
Max chunks per file (default: all). Use 1 for quick testing |
--maxfiles |
<int> |
Max files per dataset (default: all). Use 1 for quick testing |
--threads-per-worker |
<int> |
Threads per Dask worker for local runs |
--systs |
lumi pileup sf |
Enable systematic variations (see Systematics) |
--tf-study |
Add transfer factor study regions (no mass cut) to the output | |
--xrd-fallback |
Enable XRootD redirector fallback during unskimmed preprocess | |
--xrd-fallback-timeout |
<int> |
Seconds per fallback probe (default: 10) |
--xrd-fallback-retries-per-redirector |
<int> |
Probe attempts per redirector during fallback (default: 10) |
--xrd-fallback-sleep |
<float> |
Seconds between fallback retries (default: 10.0) |
--list-eras |
Print available eras and exit | |
--list-samples |
Print available samples and exit | |
--list-masses |
Print available signal mass points and exit | |
--preflight-only |
Validate fileset and exit without processing |
# Composite modes (run locally by default, sequential)
python3 bin/run_analysis.py RunIII2024Summer24 all # everything
python3 bin/run_analysis.py RunIII2024Summer24 bkg # all backgrounds
python3 bin/run_analysis.py RunIII2024Summer24 data # all data
python3 bin/run_analysis.py RunIII2024Summer24 mc # backgrounds + signal
python3 bin/run_analysis.py RunIII2024Summer24 signal # signal only
# Composite mode with custom directory and systematics
python3 bin/run_analysis.py RunIII2024Summer24 bkg --dir my_study --name test
python3 bin/run_analysis.py RunIII2024Summer24 all --systs lumi pileup sf
# Composite modes on Condor (parallel, must be inside Apptainer shell)
python3 bin/run_analysis.py RunIII2024Summer24 all --condor --systs lumi pileup sf
python3 bin/run_analysis.py RunIII2024Summer24 bkg --condor
# Single sample
python3 bin/run_analysis.py RunIII2024Summer24 DYJets
python3 bin/run_analysis.py RunIII2024Summer24 Signal --mass WR4000_N2100
# Use NLO inclusive DY for all samples
python3 bin/run_analysis.py RunIII2024Summer24 all --dy nlo_inc
# Single sample on Condor
python3 bin/run_analysis.py RunIII2024Summer24 DYJets --condor
python3 bin/run_analysis.py RunIII2024Summer24 DYJets --condor --max-workers 100
# Custom output directory and filename
python3 bin/run_analysis.py Run3Summer22EE DYJets --dir my_study --name test
# Only process resolved region
python3 bin/run_analysis.py RunIII2024Summer24 DYJets --region resolved
# Validate fileset without processing
python3 bin/run_analysis.py RunIII2024Summer24 Signal --mass WR4000_N2100 --preflight-onlyThe repository follows a clean architecture separating executable scripts, core analysis logic, configuration, and documentation.
WR_Plotter/ # Submodule for plotting ROOT histograms
bin/ # User-facing CLI scripts (production workflows)
wrcoffea/ # Installable Python package (analysis code, utilities, config)
data/ # Configuration files (JSON, CSV) and metadata
docs/ # Documentation (markdown guides)
scripts/ # Helper scripts for setup and post-processing
tests/ # Automated test suite (pytest)
test/ # Development and validation scripts
bin/ - Production Scripts
run_analysis.py- Main analysis driver script (single samples and composite modes)skim.py- Skimming pipeline (run,check,mergesubcommands)skim_job.sh- Condor worker shell script for skimming
wrcoffea/ - Installable Python Package
analyzer.py- Main Coffea processor implementing WR→Nℓ→ℓℓjj analysis (object selection, resolved/boosted regions, histogram filling, cutflows)histograms.py- Histogram specification, creation, and fillingscale_factors.py- Lepton scale factor evaluation (correctionlib)analysis_config.py- Centralized configuration (luminosities, correction paths, selection names, cuts)cli_utils.py- CLI plumbing: fileset loading, sample validation, mass point handlingera_utils.py- Era/year/run mapping and JSON I/Ofileset_utils.py- Fileset path construction, config parsing, JSON writingfileset_validation.py- Schema and selection validation for filesetssave_hists.py- ROOT histogram serializationskimmer.py- Skim selection, Runs tree handling, single-file skimmingskim_merge.py- Post-skim merging, HLT grouping, hadd, validationdas_utils.py- DAS dataset path validation, dasgoclient queries, XRootD URL constructionxrootd_fallback.py- XRootD redirector fallback for unskimmed file preprocessing
data/ - Configuration and Metadata
configs/- Per-era dataset configurations (JSON format, input to fileset scripts)filesets/- Per-era NanoAOD file lists (JSON format, output of fileset scripts)signal_points/- Available signal mass points per era (CSV format)lumis/- Golden JSON lumi masks for datajsonpog/- Correction payloads for scale factors (correctionlib)
WR_Plotter/ - Plotting Submodule
- Separate repository for ROOT histogram plotting
- Output histograms from this analyzer are saved here
- See
WR_Plotter/README.mdfor plotting documentation
scripts/ - Helper Scripts
- Fileset creation, preprocessing, and validation tools
- Post-processing and analysis utilities
docs/ - Documentation
getting_started.md- Installation, environment setup, grid proxyrun_analysis.md- Detailed analysis options and workflowsfilesets.md- Fileset creation instructionsskimming.md- Skimming pipeline: cuts, subcommands, Condor jobs, output layoutcondor.md- HTCondor setup, worker defaults, tmux, logsrun_combine.md- Limit-setting with Combine framework
tests/ - Automated Test Suite
- Unit tests for utilities, config consistency, and validation logic
- Run with pytest (see Testing)
test/ - Development Scripts
- Analysis validation and optimization studies
- Debugging and testing utilities
Run the automated test suite with pytest:
python -m pytest tests/ -vThe tests cover utility functions, configuration consistency, fileset validation, histogram creation/filling, and processor selection logic. They run quickly (no data or correctionlib files needed) and are useful for catching regressions when modifying analysis configuration or utility code.
To quickly test the full analysis chain on a small slice of data:
python3 bin/run_analysis.py RunIII2024Summer24 DYJets --maxchunks 1 --maxfiles 1 --chunksize 1000This processes a single file with one small chunk, which is useful for verifying that code changes don't break the processing pipeline before submitting large Condor jobs.
To install the test dependency:
pip install -e ".[test]"- Getting Started - Installation, environment setup, and grid proxy
- Running the Analyzer - Detailed analysis options and workflows
- Creating Filesets - Instructions for generating NanoAOD file lists
- Skimming - Skimming pipeline: selection cuts, CLI reference, Condor job details
- Running on Condor - HTCondor setup, worker defaults, tmux tips, log locations
- Expected Limits - Limit-setting with Higgs Combine framework
- WR Plotter - Plotting ROOT histograms and making stackplots
To update the WR_Plotter submodule to the latest commit:
cd WR_Plotter
git switch main
git pull
cd ..
git commit -am "Update WR_Plotter submodule"
git pushTo work on a new feature branch in the submodule:
cd WR_Plotter
git checkout -b my_feature_branch
git push -u origin my_feature_branch
cd ..