Skip to content

Add scripts to export fitted models to SBML and to generate model documentation#72

Closed
bibymaths wants to merge 3 commits into
globalfrom
codex/add-scripts-to-export-models-and-documentation
Closed

Add scripts to export fitted models to SBML and to generate model documentation#72
bibymaths wants to merge 3 commits into
globalfrom
codex/add-scripts-to-export-models-and-documentation

Conversation

@bibymaths

Copy link
Copy Markdown
Owner

Motivation

  • Provide standalone, repository-local tools to export fitted model results into validated SBML and to produce comprehensive human-readable documentation without modifying existing model code or results.
  • Support the main model families (protwise, KinOpt, TFOpt, networkmodel) and make exports robust to different result file formats and legacy result layouts.

Description

  • Add scripts/export_model_to_sbml.py, a CLI tool that discovers fitted-result files, extracts numeric fitted parameters, and writes SBML Level 3 Version 2 with dimensionless time/substance/parameter units and explanatory notes that the model is calibrated to fold-change/unitless measurements; parameters are preserved exactly and the script supports a libSBML-backed writer and a fallback XML writer when python-libsbml is unavailable.
  • The SBML exporter implements robust discovery and loading for common table/serialization formats (.csv, .tsv, .json, .pkl, .npz, .npy, .parquet, .xlsx, .xls), infers model family by path hints, exposes CLI flags --results-dir, --output-dir, --model-family, --run-id, --validate, --overwrite, and --verbose, and reports validation errors/warnings (or an installation hint when libSBML is missing).
  • Add scripts/export_model_documentation.py, a CLI tool that reuses the same discovery logic, inspects suggested source files, reconstructs or summarizes model equations, renders parameter tables and term explanations, writes model_documentation.md and model_documentation.tex, and optionally compiles a PDF via tectonic or pdflatex when available; CLI flags include --results-dir, --output-dir, --model-family, --run-id, --overwrite, --verbose, and --no-pdf.
  • Both scripts skip missing model families with clear warnings instead of failing, document fold-change/unitless measurement assumptions explicitly in outputs, and emit a concise summary of discovered families and produced files at the end.

Testing

  • Static syntax check: python -m py_compile scripts/export_model_to_sbml.py scripts/export_model_documentation.py — succeeded.
  • Dry run against repository example outputs: python scripts/export_model_to_sbml.py --results-dir notebooks/outputs --output-dir /tmp/phos_sbml --validate --verbose --overwrite — ran and reported no fitted files found in the checked example outputs (expected behavior when results are absent).
  • Documentation generation smoke: python scripts/export_model_documentation.py --results-dir notebooks/outputs --output-dir /tmp/phos_docs --no-pdf --overwrite --verbose — produced model_documentation.md and model_documentation.tex successfully (PDF compilation skipped as requested).
  • Synthetic KinOpt smoke test: created a temporary results/kinopt/tables/kinopt_results.csv and ran the exporter and documentation scripts; exporter wrote /tmp/.../sbml/kinopt_model.xml and documentation wrote /tmp/.../docs/model_documentation.md and .tex; SBML validation returned a friendly message that python-libsbml is not installed (fallback writing used) — behavior is correct and reported.

If a LaTeX engine or python-libsbml is not present the scripts still write .tex and .xml outputs and print clear instructions explaining why PDF compilation or SBML validation was skipped.


Codex Task

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a107feb287

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/export_model_to_sbml.py Outdated
Comment on lines +136 to +138
fam = infer_model_family(p)
if model_family and (fam or "unknown").lower() != model_family.lower(): continue
if fam or any(h in p.name.lower() for h in PARAM_HINTS): out.append(load_model_result(p))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use --model-family to tag unlabeled result files

When a user points this at a standard single-run directory such as results/<run_id>/tables/alpha_values.csv and supplies --model-family kinopt, infer_model_family() returns None because neither the run path nor filename contains kinopt; this branch then skips the file before it can be loaded. Since unknown-family rows are also dropped during merge, the advertised --model-family flag cannot export or document otherwise valid alpha/beta tables unless the directory or filename already embeds the family name.

Useful? React with 👍 / 👎.

Comment thread scripts/export_model_to_sbml.py Outdated
Comment on lines +100 to +102
for col in val_cols:
if any(h in col.lower() for h in PARAM_HINTS) or any(h in path.stem.lower() for h in PARAM_HINTS):
params[_safe_id("_".join(name_bits + [col]) or f"row{n}_{col}")] = float(row[col])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict result workbooks to fitted parameter columns

For any standard legacy file named like kinopt_results.* or tfopt_results.*, the path.stem check matches results, so every numeric column in every sheet/table is exported as a fitted parameter, including observed/estimated trajectories, residuals, metrics, and IDs. That produces SBML and documentation parameter tables that are polluted with non-parameter measurements whenever users pass the repository’s expected result workbooks, rather than preserving only alpha/beta or model parameter values.

Useful? React with 👍 / 👎.

@bibymaths bibymaths added documentation Improvements or additions to documentation enhancement New feature or request dependencies Pull requests that update a dependency file python Pull requests that update python code techstack Modification to techstack labels Jun 18, 2026
@bibymaths bibymaths self-assigned this Jun 18, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75b0ee665a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cn = _norm(c)
if cn in NEVER_PARAMETER_COLUMNS:
continue
if any(cn.startswith(_norm(p)) for p in WIDE_PARAMETER_PREFIXES) and df[c].map(_as_float).notna().any():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse standard networkmodel parameter CSVs

When results come from networkmodel/export.py::export_results, the standard parameter files are model_parameters_genes.csv, model_parameters_genes_psites.csv, and model_parameters_kinases.csv, with columns such as Synthesis_A, mRNA_Degradation_B, Phospho_Degradation_Dp, and Activity_Scale_ck. This fallback only accepts numeric columns whose normalized names start with alpha_, beta_, theta_, gamma_, param_, coef_, weight_, or k_, so those standard networkmodel CSV exports are treated as unrecognized and no SBML/docs are produced for them even though fitted parameters exist.

Useful? React with 👍 / 👎.

Comment on lines +406 to +407
m.parameters.update(r.parameters)
m.parameter_records.extend(r.parameter_records)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Disambiguate records when merging multiple runs

When --results-dir points at a parent directory containing multiple runs for the same family, these lines merge every run into one ModelResult without adding any run/source qualifier to parameter IDs. Two runs with the same alpha/beta tuple produce identical parameter_ids; the documentation table silently keeps whichever value wins the dict update, while parameter_records still contains duplicate IDs that are emitted as duplicate SBML parameters, yielding invalid or ambiguous exports. This occurs for common layouts like results/run1/tables/alpha_values.csv and results/run2/tables/alpha_values.csv unless users always remember to pass --run-id.

Useful? React with 👍 / 👎.

@bibymaths bibymaths added the wontfix This will not be worked on label Jun 18, 2026
@bibymaths bibymaths closed this Jun 18, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dcd116fec5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# - networkmodel/runner.py writes fitted_params_picked.json and model parameter CSVs.
# - networkmodel/export.py::export_pareto_front_to_excel writes equivalent named physical parameters.
if family == "networkmodel" and stem_norm in {"model_parameters_genes", "model_parameters_genes_psites", "model_parameters_kinases"}:
id_cols = [c for c in cols if _norm(c) in {"protein", "gene", "geneid", "psite", "phosphosite", "kinase", "param", "parameter", "name", "sol_id"}]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve Protein_Gene in networkmodel parameter IDs

In this new standard-CSV branch, the identifier whitelist omits normalized protein_gene, but the standard networkmodel exports use Protein_Gene in both model_parameters_genes.csv and model_parameters_genes_psites.csv (networkmodel/export.py:733, networkmodel/export.py:768). For multi-protein runs, _records_from_value_columns therefore builds the same IDs for each protein-level column (and only psite for site rows), so parameters keeps the last protein's value while SBML gets duplicate parameter IDs; include Protein_Gene as an ID column before recording these rows.

Useful? React with 👍 / 👎.

records.append(ParameterRecord(_safe_id(prefix), f, str(path), None, "networkmodel fitted_params_picked.json", {"path": prefix}))
walk("", payload)
else:
records = _numeric_records_from_json_like(payload, path, family)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip generic metadata JSON when harvesting parameters

When a standard result directory includes metadata.json (required by docs/result_directory_contract.md:26 and allowed to contain a parameters object at docs/result_directory_contract.md:56), this generic JSON harvest treats entries such as parameters.seed or parameters.n_gen as fitted parameters because their prefix contains parameters. Since rglob visits every .json, provenance/settings values are exported into r.parameters and SBML alongside real model parameters; restrict this path to known parameter artifacts or explicitly skip metadata/config JSON files.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request python Pull requests that update python code techstack Modification to techstack wontfix This will not be worked on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant