Add scripts to export fitted models to SBML and to generate model documentation#72
Add scripts to export fitted models to SBML and to generate model documentation#72bibymaths wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a107feb287
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fam = infer_model_family(p) | ||
| if model_family and (fam or "unknown").lower() != model_family.lower(): continue | ||
| if fam or any(h in p.name.lower() for h in PARAM_HINTS): out.append(load_model_result(p)) |
There was a problem hiding this comment.
Use --model-family to tag unlabeled result files
When a user points this at a standard single-run directory such as results/<run_id>/tables/alpha_values.csv and supplies --model-family kinopt, infer_model_family() returns None because neither the run path nor filename contains kinopt; this branch then skips the file before it can be loaded. Since unknown-family rows are also dropped during merge, the advertised --model-family flag cannot export or document otherwise valid alpha/beta tables unless the directory or filename already embeds the family name.
Useful? React with 👍 / 👎.
| for col in val_cols: | ||
| if any(h in col.lower() for h in PARAM_HINTS) or any(h in path.stem.lower() for h in PARAM_HINTS): | ||
| params[_safe_id("_".join(name_bits + [col]) or f"row{n}_{col}")] = float(row[col]) |
There was a problem hiding this comment.
Restrict result workbooks to fitted parameter columns
For any standard legacy file named like kinopt_results.* or tfopt_results.*, the path.stem check matches results, so every numeric column in every sheet/table is exported as a fitted parameter, including observed/estimated trajectories, residuals, metrics, and IDs. That produces SBML and documentation parameter tables that are polluted with non-parameter measurements whenever users pass the repository’s expected result workbooks, rather than preserving only alpha/beta or model parameter values.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75b0ee665a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| cn = _norm(c) | ||
| if cn in NEVER_PARAMETER_COLUMNS: | ||
| continue | ||
| if any(cn.startswith(_norm(p)) for p in WIDE_PARAMETER_PREFIXES) and df[c].map(_as_float).notna().any(): |
There was a problem hiding this comment.
Parse standard networkmodel parameter CSVs
When results come from networkmodel/export.py::export_results, the standard parameter files are model_parameters_genes.csv, model_parameters_genes_psites.csv, and model_parameters_kinases.csv, with columns such as Synthesis_A, mRNA_Degradation_B, Phospho_Degradation_Dp, and Activity_Scale_ck. This fallback only accepts numeric columns whose normalized names start with alpha_, beta_, theta_, gamma_, param_, coef_, weight_, or k_, so those standard networkmodel CSV exports are treated as unrecognized and no SBML/docs are produced for them even though fitted parameters exist.
Useful? React with 👍 / 👎.
| m.parameters.update(r.parameters) | ||
| m.parameter_records.extend(r.parameter_records) |
There was a problem hiding this comment.
Disambiguate records when merging multiple runs
When --results-dir points at a parent directory containing multiple runs for the same family, these lines merge every run into one ModelResult without adding any run/source qualifier to parameter IDs. Two runs with the same alpha/beta tuple produce identical parameter_ids; the documentation table silently keeps whichever value wins the dict update, while parameter_records still contains duplicate IDs that are emitted as duplicate SBML parameters, yielding invalid or ambiguous exports. This occurs for common layouts like results/run1/tables/alpha_values.csv and results/run2/tables/alpha_values.csv unless users always remember to pass --run-id.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dcd116fec5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # - networkmodel/runner.py writes fitted_params_picked.json and model parameter CSVs. | ||
| # - networkmodel/export.py::export_pareto_front_to_excel writes equivalent named physical parameters. | ||
| if family == "networkmodel" and stem_norm in {"model_parameters_genes", "model_parameters_genes_psites", "model_parameters_kinases"}: | ||
| id_cols = [c for c in cols if _norm(c) in {"protein", "gene", "geneid", "psite", "phosphosite", "kinase", "param", "parameter", "name", "sol_id"}] |
There was a problem hiding this comment.
Preserve Protein_Gene in networkmodel parameter IDs
In this new standard-CSV branch, the identifier whitelist omits normalized protein_gene, but the standard networkmodel exports use Protein_Gene in both model_parameters_genes.csv and model_parameters_genes_psites.csv (networkmodel/export.py:733, networkmodel/export.py:768). For multi-protein runs, _records_from_value_columns therefore builds the same IDs for each protein-level column (and only psite for site rows), so parameters keeps the last protein's value while SBML gets duplicate parameter IDs; include Protein_Gene as an ID column before recording these rows.
Useful? React with 👍 / 👎.
| records.append(ParameterRecord(_safe_id(prefix), f, str(path), None, "networkmodel fitted_params_picked.json", {"path": prefix})) | ||
| walk("", payload) | ||
| else: | ||
| records = _numeric_records_from_json_like(payload, path, family) |
There was a problem hiding this comment.
Skip generic metadata JSON when harvesting parameters
When a standard result directory includes metadata.json (required by docs/result_directory_contract.md:26 and allowed to contain a parameters object at docs/result_directory_contract.md:56), this generic JSON harvest treats entries such as parameters.seed or parameters.n_gen as fitted parameters because their prefix contains parameters. Since rglob visits every .json, provenance/settings values are exported into r.parameters and SBML alongside real model parameters; restrict this path to known parameter artifacts or explicitly skip metadata/config JSON files.
Useful? React with 👍 / 👎.
Motivation
protwise,KinOpt,TFOpt,networkmodel) and make exports robust to different result file formats and legacy result layouts.Description
scripts/export_model_to_sbml.py, a CLI tool that discovers fitted-result files, extracts numeric fitted parameters, and writes SBML Level 3 Version 2 with dimensionless time/substance/parameter units and explanatory notes that the model is calibrated to fold-change/unitless measurements; parameters are preserved exactly and the script supports a libSBML-backed writer and a fallback XML writer whenpython-libsbmlis unavailable..csv,.tsv,.json,.pkl,.npz,.npy,.parquet,.xlsx,.xls), infers model family by path hints, exposes CLI flags--results-dir,--output-dir,--model-family,--run-id,--validate,--overwrite, and--verbose, and reports validation errors/warnings (or an installation hint when libSBML is missing).scripts/export_model_documentation.py, a CLI tool that reuses the same discovery logic, inspects suggested source files, reconstructs or summarizes model equations, renders parameter tables and term explanations, writesmodel_documentation.mdandmodel_documentation.tex, and optionally compiles a PDF viatectonicorpdflatexwhen available; CLI flags include--results-dir,--output-dir,--model-family,--run-id,--overwrite,--verbose, and--no-pdf.Testing
python -m py_compile scripts/export_model_to_sbml.py scripts/export_model_documentation.py— succeeded.python scripts/export_model_to_sbml.py --results-dir notebooks/outputs --output-dir /tmp/phos_sbml --validate --verbose --overwrite— ran and reported no fitted files found in the checked example outputs (expected behavior when results are absent).python scripts/export_model_documentation.py --results-dir notebooks/outputs --output-dir /tmp/phos_docs --no-pdf --overwrite --verbose— producedmodel_documentation.mdandmodel_documentation.texsuccessfully (PDF compilation skipped as requested).results/kinopt/tables/kinopt_results.csvand ran the exporter and documentation scripts; exporter wrote/tmp/.../sbml/kinopt_model.xmland documentation wrote/tmp/.../docs/model_documentation.mdand.tex; SBML validation returned a friendly message thatpython-libsbmlis not installed (fallback writing used) — behavior is correct and reported.If a LaTeX engine or
python-libsbmlis not present the scripts still write.texand.xmloutputs and print clear instructions explaining why PDF compilation or SBML validation was skipped.Codex Task