diff --git a/README.md b/README.md index b04a90c..aae1f72 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,8 @@ **ipcr** is a fast, streaming, IUPAC-aware in-silico PCR toolkit for large (including .gzipped) references. It finds amplicons from primer pairs under a mismatch model with a **3′ terminal window**, supports **internal probes**, **nested PCR**, **multiplex panels**, circular templates, and emits **TSV**, **FASTA**, **JSON**, or **JSONL**. +`ipcr-thermo` provides nearest-neighbor-informed ranking with explicit approximation metadata. It is a thermodynamically informed in-silico PCR ranker, not a complete PCR kinetics simulator. See [Thermodynamic models, score profiles, and release claims](./docs/THERMO_MODELS.md). + --- - **Fast & parallel**: multi-threaded seeded scanner with per-hit verification. @@ -17,6 +19,7 @@ It finds amplicons from primer pairs under a mismatch model with a **3′ termin - **Pretty mode**: readable ASCII alignment blocks. - **Deterministic**: `--sort` gives stable order; JSON/JSONL use versioned, stable wire schemas. - **Good UX**: cancelable I/O (Ctrl-C → exit 130), consistent warnings (gated by `--quiet`), clear validation errors. +- **Transparent thermo modes**: NN models, score profiles, salt/DNTp conditions, IUPAC expansion, structure, and probe terms are exposed as output metadata when enabled. --- @@ -28,7 +31,7 @@ It finds amplicons from primer pairs under a mismatch model with a **3′ termin | `ipcr-probe` | ipcr + **internal probe** annotation & filtering | qPCR/TaqMan-style assays | | `ipcr-nested` | **Nested PCR**: outer amplicon + inner scan | Two-round/nested assays | | `ipcr-multiplex` | Panels from TSV or **pooled inline** primers | Screens / large panels | -| `ipcr-thermo` | Thermodynamically-informed scoring & ranking | Ranking / assay robustness | +| `ipcr-thermo` | Thermodynamically informed scoring & ranking | Ranking / assay robustness | --- @@ -137,7 +140,7 @@ ipcr-multiplex \ } ``` -### Thermodynamically-informed : +### Thermodynamically informed ranking: ```bash # With multiplex primer pool described by Xiong (2017); DOI: 10.3389/fmicb.2017.00420 @@ -158,6 +161,25 @@ Salmonella-Enteritidis NZ_CP025559.1 O1+O2 1853303 1854185 882 revcomp 0 0 -137. --- +## Thermodynamic scoring scope + +`ipcr-thermo` has multiple thermodynamic implementation modes and empirical score profiles. Use the mode/profile labels in output metadata rather than treating all scores as one universal scale. + +Common modes and profiles: + +| Setting | Meaning | +| ------------------ | -------------------------------------------------------- | +| `legacy-heuristic` | Historical compatibility path. | +| `nn-duplex-v1` | Nearest-neighbor primer-template duplex scoring. | +| `nn-structure-v1` | NN duplex scoring plus primer hairpin/dimer competition. | +| `binding` | Primer-template binding rank. | +| `pcr` | Binding plus extension and length proxy. | +| `gel` | PCR proxy plus band-mass proxy. | + +The `pcr` and `gel` profiles are useful empirical rankers, but they are not full polymerase kinetics or quantitative gel-intensity models. Modified probes such as MGB probes are not fully calibrated; use `--probe-score-mode annotate` or `--probe-thermo=false` unless a calibrated modifier model is available. + +See [docs/THERMO_MODELS.md](./docs/THERMO_MODELS.md) for the model matrix and fallback labels, and [docs/THERMO_RELEASE_CHECKLIST.md](./docs/THERMO_RELEASE_CHECKLIST.md) for release/smoke-test guidance. + ## Inputs - **Inline primers**: `-f/--forward`, `-r/--reverse` (5′→3′; IUPAC allowed). diff --git a/core/engine/product.go b/core/engine/product.go index e8bfdbc..c00d9a7 100644 --- a/core/engine/product.go +++ b/core/engine/product.go @@ -24,8 +24,196 @@ type Product struct { // Optional amplicon sequence Seq string `json:"seq,omitempty"` - // NEW: optional score (thermo / realistic mode) + // Optional score (thermo / realistic mode). Higher is better. The numeric + // meaning depends on the selected thermo model; see Thermo for components. Score float64 `json:"score,omitempty"` + // Optional thermodynamic score components. Populated by ipcr-thermo NN modes. + Thermo *ThermoDetails `json:"thermo,omitempty"` + SourceFile string `json:"source_file"` } + +// ThermoDetails contains interpretable thermodynamic score components for a +// product. It is intentionally model-labelled because legacy heuristic scores +// and NN-derived scores are not numerically comparable. +type ThermoDetails struct { + Model string `json:"model"` + SaltModel string `json:"salt_model"` + NaM float64 `json:"na_m,omitempty"` + MgM float64 `json:"mg_m,omitempty"` + DntpM float64 `json:"dntp_m,omitempty"` + EffectiveNaM float64 `json:"effective_na_m,omitempty"` + FreeMgM float64 `json:"free_mg_m,omitempty"` + AnnealTempC float64 `json:"anneal_temp_c"` + IUPACPolicy string `json:"iupac_policy"` + IUPACThermoPolicy string `json:"iupac_thermo_policy,omitempty"` + IUPACExpansionCount int `json:"iupac_expansion_count,omitempty"` + IUPACExpansionCapped bool `json:"iupac_expansion_capped,omitempty"` + IUPACEffectiveVariant string `json:"iupac_effective_variant,omitempty"` + IUPACVariants []ThermoVariant `json:"iupac_variants,omitempty"` + MismatchPolicy string `json:"mismatch_policy"` + StructurePolicy string `json:"structure_policy,omitempty"` + ScoreProfile string `json:"score_profile,omitempty"` + ScoreC float64 `json:"score_c"` + BaseScoreC float64 `json:"base_score_c,omitempty"` + AmpliconAdjustmentC float64 `json:"amplicon_adjustment_c,omitempty"` + ExtensionLogit float64 `json:"extension_logit,omitempty"` + ExtensionBonusC float64 `json:"extension_bonus_c,omitempty"` + LengthPenaltyC float64 `json:"length_penalty_c,omitempty"` + BandMassBonusC float64 `json:"band_mass_bonus_c,omitempty"` + StructurePenaltyC float64 `json:"structure_penalty_c,omitempty"` + LimitingSide string `json:"limiting_side"` + Fwd ThermoEndpoint `json:"fwd"` + Rev ThermoEndpoint `json:"rev"` + Probe *ProbeThermoDetails `json:"probe,omitempty"` + WorstHairpin *ThermoStructure `json:"worst_hairpin,omitempty"` + WorstSelfDimer *ThermoStructure `json:"worst_self_dimer,omitempty"` + CrossDimer *ThermoStructure `json:"cross_dimer,omitempty"` + PanelCrossDimer *ThermoStructure `json:"panel_cross_dimer,omitempty"` + PanelCrossDimerPenaltyC float64 `json:"panel_cross_dimer_penalty_c,omitempty"` + PanelCrossDimerBurdenC float64 `json:"panel_cross_dimer_burden_c,omitempty"` + PanelCrossDimerCount int `json:"panel_cross_dimer_count,omitempty"` +} + +// ThermoVariant summarizes one concrete A/C/G/T expansion of a degenerate +// primer pair under an IUPAC thermodynamics policy. It is populated only for +// enumerate mode to keep ordinary JSON output compact. +type ThermoVariant struct { + FwdPrimer string `json:"fwd_primer"` + RevPrimer string `json:"rev_primer"` + ScoreC float64 `json:"score_c"` + BaseScoreC float64 `json:"base_score_c,omitempty"` + StructurePenaltyC float64 `json:"structure_penalty_c,omitempty"` + LimitingSide string `json:"limiting_side,omitempty"` + FwdTmC float64 `json:"fwd_tm_c,omitempty"` + RevTmC float64 `json:"rev_tm_c,omitempty"` + FwdMarginC float64 `json:"fwd_margin_c,omitempty"` + RevMarginC float64 `json:"rev_margin_c,omitempty"` +} + +// ProbeThermoDetails contains internal-probe annotation plus NN probe-target +// thermodynamics. It is populated by ipcr-thermo when --probe is supplied and +// probe thermodynamics are enabled. +type ProbeThermoDetails struct { + Name string `json:"name"` + Seq string `json:"seq"` + Found bool `json:"found"` + Strand string `json:"strand,omitempty"` + Pos int `json:"pos,omitempty"` + MM int `json:"mm,omitempty"` + Site string `json:"site,omitempty"` + ScoreMode string `json:"score_mode"` + MinMarginC float64 `json:"min_margin_c,omitempty"` + ScoreContributionC float64 `json:"score_contribution_c,omitempty"` + GatePenaltyC float64 `json:"gate_penalty_c,omitempty"` + IUPACThermoPolicy string `json:"iupac_thermo_policy,omitempty"` + IUPACExpansionCount int `json:"iupac_expansion_count,omitempty"` + IUPACExpansionCapped bool `json:"iupac_expansion_capped,omitempty"` + IUPACEffectiveVariant string `json:"iupac_effective_variant,omitempty"` + TmC float64 `json:"tm_c,omitempty"` + AnnealMarginC float64 `json:"anneal_margin_c,omitempty"` + DeltaGAtAnnealKcal float64 `json:"delta_g_at_anneal_kcal,omitempty"` + MismatchPenaltyC float64 `json:"mismatch_penalty_c,omitempty"` + MismatchDeltaGKcal float64 `json:"mismatch_delta_g_kcal,omitempty"` + MismatchCount int `json:"mismatch_count,omitempty"` + MismatchFallbackCount int `json:"mismatch_fallback_count,omitempty"` + MismatchTripletCount int `json:"mismatch_triplet_count,omitempty"` + MismatchCuratedPairCount int `json:"mismatch_curated_pair_count,omitempty"` + MismatchSources []string `json:"mismatch_sources,omitempty"` + MismatchParameterSets []string `json:"mismatch_parameter_sets,omitempty"` + MismatchCitations []string `json:"mismatch_citations,omitempty"` + MismatchParameterNotes []string `json:"mismatch_parameter_notes,omitempty"` + TerminalMismatchPenaltyC float64 `json:"terminal_mismatch_penalty_c,omitempty"` + TerminalMismatchDeltaGKcal float64 `json:"terminal_mismatch_delta_g_kcal,omitempty"` + TerminalMismatchCount int `json:"terminal_mismatch_count,omitempty"` + FivePrimeTerminalMismatchCount int `json:"five_prime_terminal_mismatch_count,omitempty"` + ThreePrimeTerminalMismatchCount int `json:"three_prime_terminal_mismatch_count,omitempty"` + TerminalMismatchSources []string `json:"terminal_mismatch_sources,omitempty"` + TerminalMismatchParameterSets []string `json:"terminal_mismatch_parameter_sets,omitempty"` + TerminalMismatchCitations []string `json:"terminal_mismatch_citations,omitempty"` + TerminalMismatchParameterNotes []string `json:"terminal_mismatch_parameter_notes,omitempty"` + MismatchPolicy string `json:"mismatch_policy,omitempty"` + HasNonWatsonCrick bool `json:"has_non_watson_crick,omitempty"` + UsedHeuristicAdjust bool `json:"used_heuristic_adjust,omitempty"` +} + +// ThermoEndpoint describes one primer-template endpoint in 5'→3' primer +// coordinates. DeltaGAtAnnealKcal is an effective two-state binding term at the +// configured annealing temperature; negative values are favorable. +type ThermoEndpoint struct { + Side string `json:"side"` + TmC float64 `json:"tm_c"` + AnnealMarginC float64 `json:"anneal_margin_c"` + DeltaGAtAnnealKcal float64 `json:"delta_g_at_anneal_kcal"` + MismatchPenaltyC float64 `json:"mismatch_penalty_c"` + MismatchDeltaGKcal float64 `json:"mismatch_delta_g_kcal,omitempty"` + TerminalMismatchPenaltyC float64 `json:"terminal_mismatch_penalty_c,omitempty"` + TerminalMismatchDeltaGKcal float64 `json:"terminal_mismatch_delta_g_kcal,omitempty"` + DanglingEndAdjustmentC float64 `json:"dangling_end_adjustment_c,omitempty"` + DanglingEndDeltaGKcal float64 `json:"dangling_end_delta_g_kcal,omitempty"` + DanglingEndCount int `json:"dangling_end_count,omitempty"` + MismatchCount int `json:"mismatch_count,omitempty"` + FivePrimeMismatchCount int `json:"five_prime_mismatch_count,omitempty"` + ThreePrimeMismatchCount int `json:"three_prime_mismatch_count,omitempty"` + FivePrimeTerminalMismatchCount int `json:"five_prime_terminal_mismatch_count,omitempty"` + ThreePrimeTerminalMismatchCount int `json:"three_prime_terminal_mismatch_count,omitempty"` + TerminalMismatchCount int `json:"terminal_mismatch_count,omitempty"` + FivePrimeTerminalMismatchPenaltyC float64 `json:"five_prime_terminal_mismatch_penalty_c,omitempty"` + ThreePrimeTerminalMismatchPenaltyC float64 `json:"three_prime_terminal_mismatch_penalty_c,omitempty"` + MismatchFallbackCount int `json:"mismatch_fallback_count,omitempty"` + MismatchTripletCount int `json:"mismatch_triplet_count,omitempty"` + MismatchCuratedPairCount int `json:"mismatch_curated_pair_count,omitempty"` + MismatchSources []string `json:"mismatch_sources,omitempty"` + MismatchParameterSets []string `json:"mismatch_parameter_sets,omitempty"` + MismatchCitations []string `json:"mismatch_citations,omitempty"` + MismatchParameterNotes []string `json:"mismatch_parameter_notes,omitempty"` + TerminalMismatchSources []string `json:"terminal_mismatch_sources,omitempty"` + TerminalMismatchParameterSets []string `json:"terminal_mismatch_parameter_sets,omitempty"` + TerminalMismatchCitations []string `json:"terminal_mismatch_citations,omitempty"` + TerminalMismatchParameterNotes []string `json:"terminal_mismatch_parameter_notes,omitempty"` + EffectiveDenomCalK float64 `json:"effective_denom_cal_per_k_mol"` + MismatchPolicy string `json:"mismatch_policy"` + EndEffectPolicy string `json:"end_effect_policy,omitempty"` + HasNonWatsonCrick bool `json:"has_non_watson_crick"` + UsedHeuristicAdjust bool `json:"used_heuristic_adjust"` +} + +// ThermoStructure describes a primer secondary-structure candidate used by +// nn-structure-v1. PenaltyC is the °C-equivalent competition penalty applied to +// the final score. +type ThermoStructure struct { + Kind string `json:"kind"` + Model string `json:"model,omitempty"` + QueryA string `json:"query_a,omitempty"` + QueryB string `json:"query_b,omitempty"` + DeltaGAtAnnealKcal float64 `json:"delta_g_at_anneal_kcal"` + TmC float64 `json:"tm_c"` + AnnealMarginC float64 `json:"anneal_margin_c"` + StemLen int `json:"stem_len"` + LoopLen int `json:"loop_len,omitempty"` + AStart int `json:"a_start"` + AEnd int `json:"a_end"` + BStart int `json:"b_start"` + BEnd int `json:"b_end"` + ThreePrimeAnchored bool `json:"three_prime_anchored"` + BothThreePrimeAnchor bool `json:"both_three_prime_anchor,omitempty"` + SegmentCount int `json:"segment_count,omitempty"` + BulgeCount int `json:"bulge_count,omitempty"` + InternalLoopCount int `json:"internal_loop_count,omitempty"` + DanglingEndCount int `json:"dangling_end_count,omitempty"` + LoopPenaltyKcal float64 `json:"loop_penalty_kcal,omitempty"` + BulgePenaltyKcal float64 `json:"bulge_penalty_kcal,omitempty"` + InternalLoopPenaltyKcal float64 `json:"internal_loop_penalty_kcal,omitempty"` + StructureDanglingDeltaGKcal float64 `json:"structure_dangling_delta_g_kcal,omitempty"` + EnsembleDeltaGAtAnnealKcal float64 `json:"ensemble_delta_g_at_anneal_kcal,omitempty"` + PartitionFunction float64 `json:"partition_function,omitempty"` + EnsembleWeight float64 `json:"ensemble_weight,omitempty"` + EnsembleCandidateCount int `json:"ensemble_candidate_count,omitempty"` + DPCellCount int `json:"dp_cell_count,omitempty"` + DPStateCount int `json:"dp_state_count,omitempty"` + DPExpectedPairs float64 `json:"dp_expected_pairs,omitempty"` + DPMFEDeltaGAtAnnealKcal float64 `json:"dp_mfe_delta_g_at_anneal_kcal,omitempty"` + DPEnsembleDeltaGAtAnnealKcal float64 `json:"dp_ensemble_delta_g_at_anneal_kcal,omitempty"` + PenaltyC float64 `json:"penalty_c,omitempty"` +} diff --git a/core/oligo/oligo.go b/core/oligo/oligo.go index 0154c9c..0831b1f 100644 --- a/core/oligo/oligo.go +++ b/core/oligo/oligo.go @@ -22,8 +22,9 @@ func BestHit(amplicon, probe string, maxMM int) Hit { prbB := []byte(prb) rcB := primer.RevComp(prbB) - // Exact match fast-path - if maxMM == 0 { + // Exact match fast-path. Keep it only for strict A/C/G/T probes; degenerate + // probes must go through the IUPAC-aware matcher even when maxMM is zero. + if maxMM == 0 && isStrictACGT(prb) { if i := strings.Index(amp, prb); i >= 0 { return Hit{Found: true, Strand: "+", Pos: i, MM: 0, Site: amp[i : i+len(prb)]} } @@ -68,3 +69,17 @@ func BestHit(amplicon, probe string, maxMM int) Hit { } return best } + +func isStrictACGT(s string) bool { + if s == "" { + return false + } + for i := 0; i < len(s); i++ { + switch s[i] { + case 'A', 'C', 'G', 'T': + default: + return false + } + } + return true +} diff --git a/core/oligo/oligo_test.go b/core/oligo/oligo_test.go index 1f209a3..e0b4aa9 100644 --- a/core/oligo/oligo_test.go +++ b/core/oligo/oligo_test.go @@ -12,3 +12,10 @@ func TestBestHit(t *testing.T) { t.Fatalf("expected a hit on RC with mismatches") } } + +func TestBestHitDegenerateProbeExactUsesIUPACMatcher(t *testing.T) { + h := BestHit("AAAGACCC", "GAY", 0) + if !h.Found || h.Strand != "+" || h.Pos != 3 || h.Site != "GAC" { + t.Fatalf("unexpected degenerate hit: %+v", h) + } +} diff --git a/core/thermo/conditions.go b/core/thermo/conditions.go new file mode 100644 index 0000000..ee6019c --- /dev/null +++ b/core/thermo/conditions.go @@ -0,0 +1,190 @@ +package thermo + +import ( + "fmt" + "math" + "strings" +) + +// SaltModel identifies how solution ions are reduced to the monovalent value +// consumed by the current nearest-neighbor entropy correction. +type SaltModel string + +const ( + // SaltModelMonovalent uses only the supplied monovalent cation concentration. + SaltModelMonovalent SaltModel = "monovalent" + + // SaltModelOwczarzyLite applies the historical Mg-to-Na-equivalent heuristic: + // Na_eff = Na + 3.8*sqrt(Mg). This is an approximation, not a full mixed-salt model. + SaltModelOwczarzyLite SaltModel = "owczarzy-lite" + + // SaltModelOwczarzy08 applies the mixed monovalent/divalent salt correction + // form from Owczarzy et al. 2008 using free Mg after dNTP chelation. + SaltModelOwczarzy08 SaltModel = "owczarzy08" +) + +// Conditions collects wet-lab inputs used by thermodynamic calculations. +type Conditions struct { + AnnealC float64 + NaM float64 + MgM float64 + DntpM float64 + PrimerTotalM float64 + SaltModel SaltModel + SelfComplementary bool +} + +// DefaultConditions returns the ipcr-thermo CLI defaults in mol/L. +func DefaultConditions() Conditions { + return Conditions{ + AnnealC: 60, + NaM: 0.05, + MgM: 0.003, + DntpM: 0, + PrimerTotalM: 2.5e-7, + SaltModel: SaltModelMonovalent, + } +} + +func (m SaltModel) String() string { + if m == "" { + return string(SaltModelMonovalent) + } + return string(m) +} + +// ParseSaltModel validates and normalizes a salt model name. +func ParseSaltModel(raw string) (SaltModel, error) { + s := strings.TrimSpace(strings.ToLower(raw)) + if s == "" { + return SaltModelMonovalent, nil + } + switch SaltModel(s) { + case SaltModelMonovalent, SaltModelOwczarzyLite, SaltModelOwczarzy08: + return SaltModel(s), nil + default: + return "", fmt.Errorf("unknown salt model %q; expected one of: %s", raw, KnownSaltModels()) + } +} + +// KnownSaltModels returns CLI help text for salt model choices. +func KnownSaltModels() string { + return strings.Join([]string{SaltModelMonovalent.String(), SaltModelOwczarzyLite.String(), SaltModelOwczarzy08.String()}, " | ") +} + +// WithDefaults fills missing condition fields with CLI defaults. MgM is not +// defaulted here because zero magnesium is a valid explicit experimental state; +// callers that want the CLI default should start from DefaultConditions(). +func (c Conditions) WithDefaults() Conditions { + d := DefaultConditions() + if c.AnnealC == 0 { + c.AnnealC = d.AnnealC + } + if c.NaM == 0 { + c.NaM = d.NaM + } + if c.PrimerTotalM == 0 { + c.PrimerTotalM = d.PrimerTotalM + } + if c.SaltModel == "" { + c.SaltModel = d.SaltModel + } + return c +} + +// EffectiveNaM returns the monovalent concentration consumed by the current NN +// salt correction under the selected salt model. +func (c Conditions) EffectiveNaM() float64 { + c = c.WithDefaults() + return EffectiveMonovalent(c.NaM, c.MgM, c.DntpM, c.SaltModel) +} + +// FreeMgM returns free Mg2+ after a simple dNTP chelation approximation. The +// input DntpM is interpreted as total dNTP concentration in mol/L. +func (c Conditions) FreeMgM() float64 { + c = c.WithDefaults() + return FreeMagnesium(c.MgM, c.DntpM) +} + +// TmInput builds the core nearest-neighbor Tm input from these conditions. +func (c Conditions) TmInput() TmInput { + c = c.WithDefaults() + x := 4 + if c.SelfComplementary { + x = 1 + } + return TmInput{ + CT: c.PrimerTotalM, + Na: c.EffectiveNaM(), + Mg: c.MgM, + Dntp: c.DntpM, + SaltModel: c.SaltModel, + X: x, + } +} + +// ParseConc parses common molar strings such as "50mM", "250nM", "3uM", +// "3µM" (micro sign), and "3μM" (Greek mu) into mol/L. +func ParseConc(s string) (float64, error) { + raw := strings.TrimSpace(s) + norm := strings.ToLower(raw) + norm = strings.ReplaceAll(norm, "µ", "u") + norm = strings.ReplaceAll(norm, "μ", "u") + + unit := "" + val := 0.0 + _, err := fmt.Sscanf(norm, "%f%s", &val, &unit) + if err != nil { + return 0, fmt.Errorf("invalid conc %q: %w", raw, err) + } + if val < 0 { + return 0, fmt.Errorf("invalid conc %q: concentration must be non-negative", raw) + } + switch unit { + case "m", "": + return val, nil + case "mm": + return val * 1e-3, nil + case "um": + return val * 1e-6, nil + case "nm": + return val * 1e-9, nil + default: + return 0, fmt.Errorf("unknown unit %q in %q", unit, raw) + } +} + +// EffectiveMonovalent returns the effective monovalent concentration under an +// explicit salt model. +func EffectiveMonovalent(naM, mgM, dntpM float64, model SaltModel) float64 { + if model == "" { + model = SaltModelMonovalent + } + if model == SaltModelOwczarzyLite && mgM > 0 { + return naM + 3.8*math.Sqrt(FreeMagnesium(mgM, dntpM)) + } + return naM +} + +// FreeMagnesium returns free Mg2+ after dNTP chelation. The equilibrium form +// follows the common Owczarzy/Primer3-style Mg:dNTP association approximation +// with Ka=3e4 M^-1. +func FreeMagnesium(mgM, dntpM float64) float64 { + if mgM <= 0 { + return 0 + } + if dntpM <= 0 { + return mgM + } + const ka = 3e4 + b := ka*dntpM - ka*mgM + 1.0 + disc := b*b + 4.0*ka*mgM + if disc < 0 || math.IsNaN(disc) || math.IsInf(disc, 0) { + return 0 + } + free := (-b + math.Sqrt(disc)) / (2.0 * ka) + if free < 0 || math.IsNaN(free) || math.IsInf(free, 0) { + return 0 + } + return free +} diff --git a/core/thermo/conditions_test.go b/core/thermo/conditions_test.go new file mode 100644 index 0000000..6c3d269 --- /dev/null +++ b/core/thermo/conditions_test.go @@ -0,0 +1,81 @@ +package thermo + +import ( + "math" + "testing" +) + +func TestParseConcAcceptsMicroVariants(t *testing.T) { + for _, s := range []string{"3uM", "3µM", "3μM"} { + got, err := ParseConc(s) + if err != nil { + t.Fatalf("ParseConc(%q): %v", s, err) + } + if math.Abs(got-3e-6) > 1e-15 { + t.Fatalf("ParseConc(%q)=%g, want 3e-6", s, got) + } + } +} + +func TestParseSaltModel(t *testing.T) { + for _, raw := range []string{"", "monovalent", "owczarzy-lite", "owczarzy08"} { + if _, err := ParseSaltModel(raw); err != nil { + t.Fatalf("ParseSaltModel(%q): %v", raw, err) + } + } + if _, err := ParseSaltModel("hidden-env"); err == nil { + t.Fatal("expected unknown salt model error") + } +} + +func TestEffectiveMonovalentSaltModels(t *testing.T) { + na := 0.05 + mg := 0.003 + mono := EffectiveMonovalent(na, mg, 0, SaltModelMonovalent) + if mono != na { + t.Fatalf("monovalent model changed Na: got %g want %g", mono, na) + } + lite := EffectiveMonovalent(na, mg, 0, SaltModelOwczarzyLite) + if !(lite > na) { + t.Fatalf("owczarzy-lite should increase effective Na with Mg: got %g <= %g", lite, na) + } +} + +func TestConditionsTmInputUsesEffectiveSaltAndSelfFactor(t *testing.T) { + c := Conditions{ + AnnealC: 55, + NaM: 0.05, + MgM: 0.003, + PrimerTotalM: 2.5e-7, + SaltModel: SaltModelOwczarzyLite, + SelfComplementary: true, + } + in := c.TmInput() + if in.CT != c.PrimerTotalM || in.X != 1 { + t.Fatalf("bad TmInput concentration/factor: %+v", in) + } + if !(in.Na > c.NaM) { + t.Fatalf("expected effective Na > raw Na under owczarzy-lite: %+v", in) + } +} + +func TestFreeMagnesiumSubtractsDNTPSafely(t *testing.T) { + got := FreeMagnesium(0.003, 0.0008) + if !(got > 0.002 && got < 0.003) { + t.Fatalf("FreeMagnesium got %g, want positive chelated Mg below total Mg", got) + } + if got := FreeMagnesium(0.001, 0.002); !(got >= 0 && got < 0.001) { + t.Fatalf("FreeMagnesium should remain bounded below total Mg, got %g", got) + } +} + +func TestOwczarzy08TmInputPreservesRawNaAndFreeMg(t *testing.T) { + c := Conditions{NaM: 0.05, MgM: 0.003, DntpM: 0.0008, PrimerTotalM: 2.5e-7, SaltModel: SaltModelOwczarzy08} + in := c.TmInput() + if in.SaltModel != SaltModelOwczarzy08 || in.Na != c.NaM || in.Mg != c.MgM || in.Dntp != c.DntpM { + t.Fatalf("bad owczarzy08 TmInput: %+v", in) + } + if !(c.FreeMgM() > 0.002 && c.FreeMgM() < c.MgM) { + t.Fatalf("bad free Mg: %g", c.FreeMgM()) + } +} diff --git a/core/thermo/dangling_params.go b/core/thermo/dangling_params.go new file mode 100644 index 0000000..26ae055 --- /dev/null +++ b/core/thermo/dangling_params.go @@ -0,0 +1,202 @@ +package thermo + +const ( + // DanglingEndParameterSetSantaLuciaHicks2004V1 identifies the DNA/DNA + // terminal dangling-end nearest-neighbor table from SantaLucia & Hicks 2004, + // Table 3, in 1 M NaCl. + DanglingEndParameterSetSantaLuciaHicks2004V1 = "santalucia-hicks-2004-dna-dangling-ends-v1" + + // DanglingEndStrand5Prime means the unpaired base is a 5' dangling end on the + // strand carrying the dangling base, as in 5'-XA-3'/3'-T-5'. + DanglingEndStrand5Prime byte = '5' + + // DanglingEndStrand3Prime means the unpaired base is a 3' dangling end on the + // strand carrying the dangling base, as in 5'-AX-3'/3'-T-5'. + DanglingEndStrand3Prime byte = '3' + + // DanglingEndSideTemplate5Prime means the unpaired target/template base is on + // the target strand's 5' side. In primer-aligned coordinates this is adjacent + // to the primer 3' end. + DanglingEndSideTemplate5Prime byte = DanglingEndStrand5Prime + + // DanglingEndSideTemplate3Prime means the unpaired target/template base is on + // the target strand's 3' side. In primer-aligned coordinates this is adjacent + // to the primer 5' end. + DanglingEndSideTemplate3Prime byte = DanglingEndStrand3Prime +) + +const ( + danglingEndSourceSantaLuciaHicks2004Table3 = "santalucia-hicks-2004-table-3" + danglingEndCitationSantaLuciaHicks2004 = "SantaLucia J Jr, Hicks D. The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct. 2004;33:415-440. Table 3. doi:10.1146/annurev.biophys.32.110601.141800" + danglingEndNoteSantaLuciaHicks2004 = "Terminal dangling-end nearest-neighbor increment next to a Watson-Crick DNA pair in 1 M NaCl; Table 3 reports ΔH° and ΔG°37. ΔS° is computed from ΔH° and ΔG°37 at 310.15 K." +) + +// DanglingEndKey identifies one terminal dangling-end nearest-neighbor term in +// the orientation of the strand carrying the dangling base. PairedBase is the +// Watson-Crick base adjacent to the dangling base on that same strand; +// OppositeBase is the base on the opposite strand. +type DanglingEndKey struct { + StrandEnd byte + DanglingBase byte + PairedBase byte + OppositeBase byte +} + +// DanglingEndParameter stores one SantaLucia-Hicks terminal dangling-end term. +// DeltaG37kcal is an additive endpoint increment, not a penalty; it may be +// favorable (negative) or unfavorable (positive). +type DanglingEndParameter struct { + Key DanglingEndKey + Motif string + DeltaHkcal float64 + DeltaScalK float64 + DeltaG37kcal float64 + Source string + ParameterSet string + Citation string + Note string +} + +var danglingEndParametersByKey = map[DanglingEndKey]DanglingEndParameter{} + +// CuratedDanglingEnds contains the complete SantaLucia-Hicks 2004 Table 3 DNA/DNA +// terminal dangling-end ΔH° and ΔG°37 table, in the orientation of the dangling +// strand. +var CuratedDanglingEnds = []DanglingEndParameter{ + // 5' dangling ends: 5'-XA-3'/3'-T-5'. + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'A', 'A', 'T', 0.2, -0.51), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'C', 'A', 'T', 0.6, -0.42), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'G', 'A', 'T', -1.1, -0.62), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'T', 'A', 'T', -6.9, -0.71), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'A', 'C', 'G', -6.3, -0.96), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'C', 'C', 'G', -4.4, -0.52), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'G', 'C', 'G', -5.1, -0.72), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'T', 'C', 'G', -4.0, -0.58), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'A', 'G', 'C', -3.7, -0.58), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'C', 'G', 'C', -4.0, -0.34), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'G', 'G', 'C', -3.9, -0.56), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'T', 'G', 'C', -4.9, -0.61), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'A', 'T', 'A', -2.9, -0.50), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'C', 'T', 'A', -4.1, -0.02), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'G', 'T', 'A', -4.2, 0.48), + santaLuciaHicksDanglingEnd(DanglingEndStrand5Prime, 'T', 'T', 'A', -0.2, -0.10), + + // 3' dangling ends: 5'-AX-3'/3'-T-5'. + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'A', 'A', 'T', -0.5, -0.12), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'C', 'A', 'T', 4.7, 0.28), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'G', 'A', 'T', -4.1, -0.01), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'T', 'A', 'T', -3.8, 0.13), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'A', 'C', 'G', -5.9, -0.82), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'C', 'C', 'G', -2.6, -0.31), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'G', 'C', 'G', -3.2, -0.01), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'T', 'C', 'G', -5.2, -0.52), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'A', 'G', 'C', -2.1, -0.92), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'C', 'G', 'C', -0.2, -0.23), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'G', 'G', 'C', -3.9, -0.44), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'T', 'G', 'C', -4.4, -0.35), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'A', 'T', 'A', -0.7, -0.48), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'C', 'T', 'A', 4.4, -0.19), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'G', 'T', 'A', -1.6, -0.50), + santaLuciaHicksDanglingEnd(DanglingEndStrand3Prime, 'T', 'T', 'A', 2.9, -0.29), +} + +// CuratedDanglingEndParameters is retained as a descriptive alias for tests and +// callers that want the whole table. +var CuratedDanglingEndParameters = CuratedDanglingEnds + +func init() { + for _, p := range CuratedDanglingEnds { + danglingEndParametersByKey[p.Key] = p + } +} + +func santaLuciaHicksDanglingEnd(strandEnd, danglingBase, pairedBase, oppositeBase byte, deltaH, deltaG37 float64) DanglingEndParameter { + key := DanglingEndKey{ + StrandEnd: normalizeDanglingEndSide(strandEnd), + DanglingBase: normalizeBase(danglingBase), + PairedBase: normalizeBase(pairedBase), + OppositeBase: normalizeBase(oppositeBase), + } + return DanglingEndParameter{ + Key: key, + Motif: danglingEndMotif(key), + DeltaHkcal: deltaH, + DeltaScalK: (deltaH - deltaG37) * 1000.0 / 310.15, + DeltaG37kcal: deltaG37, + Source: danglingEndSourceSantaLuciaHicks2004Table3, + ParameterSet: DanglingEndParameterSetSantaLuciaHicks2004V1, + Citation: danglingEndCitationSantaLuciaHicks2004, + Note: danglingEndNoteSantaLuciaHicks2004, + } +} + +func danglingEndMotif(key DanglingEndKey) string { + key.StrandEnd = normalizeDanglingEndSide(key.StrandEnd) + key.DanglingBase = normalizeBase(key.DanglingBase) + key.PairedBase = normalizeBase(key.PairedBase) + key.OppositeBase = normalizeBase(key.OppositeBase) + switch key.StrandEnd { + case DanglingEndStrand5Prime: + return string([]byte{key.DanglingBase, key.PairedBase}) + "/" + string([]byte{key.OppositeBase}) + case DanglingEndStrand3Prime: + return string([]byte{key.PairedBase, key.DanglingBase}) + "/" + string([]byte{key.OppositeBase}) + default: + return "" + } +} + +// LookupDanglingEndParameter returns a terminal dangling-end parameter keyed in +// the orientation of the dangling strand. +func LookupDanglingEndParameter(key DanglingEndKey) (DanglingEndParameter, bool) { + key.StrandEnd = normalizeDanglingEndSide(key.StrandEnd) + key.DanglingBase = normalizeBase(key.DanglingBase) + key.PairedBase = normalizeBase(key.PairedBase) + key.OppositeBase = normalizeBase(key.OppositeBase) + p, ok := danglingEndParametersByKey[key] + return p, ok +} + +// LookupTemplateDanglingEnd returns the table-backed terminal dangling-end term +// for a target/template dangling base next to a Watson-Crick closing pair. The +// side argument is the target/template strand side ('5' or '3'). +func LookupTemplateDanglingEnd(side, x, primerBase, targetBase byte) (DanglingEndParameter, bool) { + key := DanglingEndKey{ + StrandEnd: normalizeDanglingEndSide(side), + DanglingBase: normalizeBase(x), + PairedBase: normalizeBase(targetBase), + OppositeBase: normalizeBase(primerBase), + } + if key.StrandEnd == 0 || key.DanglingBase == 'N' || key.PairedBase == 'N' || key.OppositeBase == 'N' { + return DanglingEndParameter{}, false + } + if !wc(key.OppositeBase, key.PairedBase) { + return DanglingEndParameter{}, false + } + return LookupDanglingEndParameter(key) +} + +// LookupTemplateDanglingEndParameter maps a primer-side label to the target +// dangling-end table. A target base next to the primer 5' end is a target 3' +// dangling end; a target base next to the primer 3' end is a target 5' dangling +// end. +func LookupTemplateDanglingEndParameter(side string, danglingBase, terminalPrimerBase, terminalTargetBase byte) (DanglingEndParameter, bool) { + switch side { + case "primer-5p": + return LookupTemplateDanglingEnd(DanglingEndSideTemplate3Prime, danglingBase, terminalPrimerBase, terminalTargetBase) + case "primer-3p": + return LookupTemplateDanglingEnd(DanglingEndSideTemplate5Prime, danglingBase, terminalPrimerBase, terminalTargetBase) + default: + return DanglingEndParameter{}, false + } +} + +func normalizeDanglingEndSide(side byte) byte { + switch side { + case DanglingEndStrand5Prime: + return DanglingEndStrand5Prime + case DanglingEndStrand3Prime: + return DanglingEndStrand3Prime + default: + return 0 + } +} diff --git a/core/thermo/goldens_test.go b/core/thermo/goldens_test.go new file mode 100644 index 0000000..fa742e3 --- /dev/null +++ b/core/thermo/goldens_test.go @@ -0,0 +1,362 @@ +package thermo + +import ( + "encoding/csv" + "math" + "os" + "strconv" + "testing" +) + +func readGoldenTSV(t *testing.T, path string) []map[string]string { + t.Helper() + f, err := os.Open(path) + if err != nil { + t.Fatalf("open %s: %v", path, err) + } + defer func() { _ = f.Close() }() + + r := csv.NewReader(f) + r.Comma = '\t' + r.Comment = '#' + r.FieldsPerRecord = -1 + rows, err := r.ReadAll() + if err != nil { + t.Fatalf("read %s: %v", path, err) + } + if len(rows) < 2 { + t.Fatalf("%s: expected header and at least one row", path) + } + header := rows[0] + out := make([]map[string]string, 0, len(rows)-1) + for _, row := range rows[1:] { + m := make(map[string]string, len(header)) + for i, h := range header { + if i < len(row) { + m[h] = row[i] + } + } + out = append(out, m) + } + return out +} + +func goldenFloat(t *testing.T, row map[string]string, key string) float64 { + t.Helper() + v, err := strconv.ParseFloat(row[key], 64) + if err != nil { + t.Fatalf("parse %s=%q: %v", key, row[key], err) + } + return v +} + +func goldenInt(t *testing.T, row map[string]string, key string) int { + t.Helper() + v, err := strconv.Atoi(row[key]) + if err != nil { + t.Fatalf("parse %s=%q: %v", key, row[key], err) + } + return v +} + +func goldenBase(t *testing.T, row map[string]string, key string) byte { + t.Helper() + v := row[key] + if len(v) != 1 { + t.Fatalf("%s: expected one base, got %q", key, v) + } + return v[0] +} + +func goldenOptionalBase(row map[string]string, key string) byte { + v := row[key] + if v == "" { + return 0 + } + return []byte(v)[0] +} + +func goldenConditions(t *testing.T, row map[string]string) Conditions { + t.Helper() + model, err := ParseSaltModel(row["salt_model"]) + if err != nil { + t.Fatalf("ParseSaltModel(%q): %v", row["salt_model"], err) + } + return Conditions{ + AnnealC: goldenFloat(t, row, "anneal_c"), + NaM: goldenFloat(t, row, "na_m"), + MgM: goldenFloat(t, row, "mg_m"), + DntpM: goldenFloat(t, row, "dntp_m"), + PrimerTotalM: goldenFloat(t, row, "primer_total_m"), + SaltModel: model, + } +} + +func assertNearGolden(t *testing.T, label string, got, want, tol float64) { + t.Helper() + if math.Abs(got-want) > tol { + t.Fatalf("%s: got %.15g want %.15g tolerance %.3g", label, got, want, tol) + } +} + +func TestGoldenPerfectDuplexes(t *testing.T) { + for _, row := range readGoldenTSV(t, "testdata/perfect_duplex_goldens.golden") { + row := row + t.Run(row["id"], func(t *testing.T) { + got, err := PerfectDuplex(row["seq"], row["target3to5"], goldenConditions(t, row)) + if err != nil { + t.Fatalf("PerfectDuplex: %v", err) + } + tol := goldenFloat(t, row, "tolerance") + assertNearGolden(t, "tm_c", got.TmC, goldenFloat(t, row, "tm_c"), tol) + assertNearGolden(t, "margin_c", got.AnnealMarginC, goldenFloat(t, row, "margin_c"), tol) + assertNearGolden(t, "dg_kcal", got.DeltaGAtAnnealKcal, goldenFloat(t, row, "dg_kcal"), tol) + }) + } +} + +func TestGoldenSaltModels(t *testing.T) { + for _, row := range readGoldenTSV(t, "testdata/salt_goldens.golden") { + row := row + t.Run(row["id"], func(t *testing.T) { + cond := goldenConditions(t, row) + got, err := PerfectDuplex(row["seq"], row["target3to5"], cond) + if err != nil { + t.Fatalf("PerfectDuplex: %v", err) + } + tol := goldenFloat(t, row, "tolerance") + assertNearGolden(t, "effective_na_m", cond.EffectiveNaM(), goldenFloat(t, row, "effective_na_m"), tol) + assertNearGolden(t, "free_mg_m", cond.FreeMgM(), goldenFloat(t, row, "free_mg_m"), tol) + assertNearGolden(t, "tm_c", got.TmC, goldenFloat(t, row, "tm_c"), tol) + assertNearGolden(t, "margin_c", got.AnnealMarginC, goldenFloat(t, row, "margin_c"), tol) + assertNearGolden(t, "dg_kcal", got.DeltaGAtAnnealKcal, goldenFloat(t, row, "dg_kcal"), tol) + }) + } +} + +func TestGoldenImperfectDuplexes(t *testing.T) { + for _, row := range readGoldenTSV(t, "testdata/mismatch_goldens.golden") { + row := row + t.Run(row["id"], func(t *testing.T) { + got, err := ImperfectDuplex(row["primer"], row["target3to5"], goldenConditions(t, row)) + if err != nil { + t.Fatalf("ImperfectDuplex: %v", err) + } + tol := goldenFloat(t, row, "tolerance") + assertNearGolden(t, "tm_c", got.TmC, goldenFloat(t, row, "tm_c"), tol) + assertNearGolden(t, "mismatch_penalty_c", got.MismatchPenaltyC, goldenFloat(t, row, "mismatch_penalty_c"), tol) + assertNearGolden(t, "dg_penalty_kcal", got.DeltaGPenaltyKcal, goldenFloat(t, row, "dg_penalty_kcal"), tol) + if got.MismatchCount != goldenInt(t, row, "mismatch_count") || + got.FivePrimeMismatchCount != goldenInt(t, row, "five_prime_count") || + got.ThreePrimeMismatchCount != goldenInt(t, row, "three_prime_count") || + got.TerminalMismatchCount != goldenInt(t, row, "terminal_count") || + got.HeuristicFallbackCount+got.DefaultFallbackCount != goldenInt(t, row, "fallback_count") || + got.TripletTmCount+got.TripletDeltaGCount != goldenInt(t, row, "triplet_count") { + t.Fatalf("mismatch counts changed: got %+v row %+v", got, row) + } + if got.MismatchPolicy != row["policy"] { + t.Fatalf("policy: got %q want %q", got.MismatchPolicy, row["policy"]) + } + }) + } +} + +func TestGoldenMismatchTriplets(t *testing.T) { + for _, row := range readGoldenTSV(t, "testdata/mismatch_triplet_goldens.golden") { + row := row + t.Run(row["id"], func(t *testing.T) { + model, err := ParseSaltModel(row["salt_model"]) + if err != nil { + t.Fatalf("ParseSaltModel(%q): %v", row["salt_model"], err) + } + cond := Conditions{ + AnnealC: goldenFloat(t, row, "anneal_c"), + NaM: goldenFloat(t, row, "na_m"), + MgM: goldenFloat(t, row, "mg_m"), + DntpM: goldenFloat(t, row, "dntp_m"), + PrimerTotalM: DefaultConditions().PrimerTotalM, + SaltModel: model, + } + + got, err := ImperfectDuplex(row["primer"], row["target"], cond) + if err != nil { + t.Fatalf("ImperfectDuplex: %v", err) + } + tol := goldenFloat(t, row, "tolerance_delta_g") + assertNearGolden(t, "delta_delta_g", got.DeltaGPenaltyKcal, goldenFloat(t, row, "expected_delta_delta_g_kcal"), tol) + + if got.MismatchCount != goldenInt(t, row, "expected_mismatch_count") { + t.Fatalf("mismatch count: got %d want %s", got.MismatchCount, row["expected_mismatch_count"]) + } + tripletCount := got.TripletTmCount + got.TripletDeltaGCount + if tripletCount != goldenInt(t, row, "expected_triplet_count") { + t.Fatalf("triplet count: got %d want %s; result=%+v", tripletCount, row["expected_triplet_count"], got) + } + fallbackCount := got.HeuristicFallbackCount + got.DefaultFallbackCount + if fallbackCount != goldenInt(t, row, "expected_fallback_count") { + t.Fatalf("fallback count: got %d want %s; result=%+v", fallbackCount, row["expected_fallback_count"], got) + } + if got.MismatchPolicy != MismatchPolicyImperfectTriplet { + t.Fatalf("policy: got %q want %q", got.MismatchPolicy, MismatchPolicyImperfectTriplet) + } + + perfectTarget, ok := compStrict(row["primer"]) + if !ok { + t.Fatalf("compStrict failed for %q", row["primer"]) + } + perfect, err := PerfectDuplex(row["primer"], perfectTarget, cond) + if err != nil { + t.Fatalf("PerfectDuplex: %v", err) + } + switch row["expected_tm_direction"] { + case "decrease": + if !(got.TmC < perfect.TmC) { + t.Fatalf("expected mismatch to decrease Tm: perfect=%g got=%g", perfect.TmC, got.TmC) + } + case "increase": + if !(got.TmC > perfect.TmC) { + t.Fatalf("expected mismatch to increase Tm: perfect=%g got=%g", perfect.TmC, got.TmC) + } + default: + t.Fatalf("unknown expected_tm_direction %q", row["expected_tm_direction"]) + } + + if len(got.Contributions) != 1 { + t.Fatalf("expected one mismatch contribution, got %d: %+v", len(got.Contributions), got.Contributions) + } + c := got.Contributions[0] + key := MismatchKey{P5: c.P5, P: c.PrimerBase, P3: c.P3, T5: c.T5, T: c.TargetBase, T3: c.T3} + param, ok := LookupMismatchParameterInfo(key) + if !ok { + t.Fatalf("missing parameter info for %+v", key) + } + if param.Source != MismatchSourceTripletDeltaG { + t.Fatalf("source: got %q want %q for %+v", param.Source, MismatchSourceTripletDeltaG, key) + } + if param.ParameterSet != row["expected_parameter_set"] { + t.Fatalf("parameter set: got %q want %q", param.ParameterSet, row["expected_parameter_set"]) + } + }) + } +} + +func TestGoldenDanglingEnds(t *testing.T) { + rows := readGoldenTSV(t, "testdata/dangling_end_goldens.golden") + if len(rows) != len(CuratedDanglingEndParameters) { + t.Fatalf("dangling-end golden count: got %d want %d", len(rows), len(CuratedDanglingEndParameters)) + } + for _, row := range rows { + row := row + t.Run(row["id"], func(t *testing.T) { + var strandEnd byte + switch row["template_end"] { + case "5p": + strandEnd = DanglingEndStrand5Prime + case "3p": + strandEnd = DanglingEndStrand3Prime + default: + t.Fatalf("unknown template_end %q", row["template_end"]) + } + + param, ok := LookupDanglingEndParameter(DanglingEndKey{ + StrandEnd: strandEnd, + DanglingBase: goldenBase(t, row, "dangling_base"), + PairedBase: goldenBase(t, row, "terminal_target_base"), + OppositeBase: goldenBase(t, row, "terminal_primer_base"), + }) + if !ok { + t.Fatalf("missing dangling-end parameter for row %+v", row) + } + tol := goldenFloat(t, row, "tolerance") + assertNearGolden(t, "delta_h", param.DeltaHkcal, goldenFloat(t, row, "expected_delta_h_kcal"), tol) + assertNearGolden(t, "delta_s", param.DeltaScalK, goldenFloat(t, row, "expected_delta_s_cal_k"), tol) + assertNearGolden(t, "delta_g37", param.DeltaG37kcal, goldenFloat(t, row, "expected_delta_g37_kcal"), tol) + if param.ParameterSet != row["expected_parameter_set"] || param.Source != row["source_id"] || param.Citation == "" || param.Note == "" { + t.Fatalf("parameter provenance missing/changed: %+v", param) + } + }) + } +} + +func TestGoldenDanglingEndContexts(t *testing.T) { + for _, row := range readGoldenTSV(t, "testdata/dangling_end_context_goldens.golden") { + row := row + t.Run(row["id"], func(t *testing.T) { + cond := goldenConditions(t, row) + base, err := ImperfectDuplex(row["primer"], row["target3to5"], cond) + if err != nil { + t.Fatalf("base ImperfectDuplex: %v", err) + } + got, err := ImperfectDuplexWithOptionsAndContext( + row["primer"], + row["target3to5"], + cond, + DefaultImperfectDuplexOptions(), + DanglingEndContext{ + FivePrimeBase: goldenOptionalBase(row, "five_prime_base"), + ThreePrimeBase: goldenOptionalBase(row, "three_prime_base"), + }, + ) + if err != nil { + t.Fatalf("dangling ImperfectDuplex: %v", err) + } + tol := goldenFloat(t, row, "tolerance_delta_g") + assertNearGolden(t, "dangling_delta_g", got.DanglingEndDeltaGKcal, goldenFloat(t, row, "expected_delta_g_kcal"), tol) + if got.DanglingEndCount != goldenInt(t, row, "expected_dangling_count") || len(got.DanglingContributions) != goldenInt(t, row, "expected_dangling_count") { + t.Fatalf("dangling count: got result=%+v row=%+v", got, row) + } + for _, c := range got.DanglingContributions { + if c.ParameterSet != row["expected_parameter_set"] || c.Source != row["source_id"] || c.Citation == "" || c.ParameterNote == "" { + t.Fatalf("dangling provenance missing/changed: %+v", c) + } + } + switch row["expected_tm_direction"] { + case "increase": + if !(got.TmC > base.TmC && got.DeltaGAtAnnealKcal < base.DeltaGAtAnnealKcal) { + t.Fatalf("expected dangling end to stabilize endpoint: base=%+v got=%+v", base.DuplexResult, got.DuplexResult) + } + case "decrease": + if !(got.TmC < base.TmC && got.DeltaGAtAnnealKcal > base.DeltaGAtAnnealKcal) { + t.Fatalf("expected dangling end to destabilize endpoint: base=%+v got=%+v", base.DuplexResult, got.DuplexResult) + } + default: + t.Fatalf("unknown expected_tm_direction %q", row["expected_tm_direction"]) + } + }) + } +} + +func TestGoldenStructures(t *testing.T) { + for _, row := range readGoldenTSV(t, "testdata/structure_goldens.golden") { + row := row + t.Run(row["id"], func(t *testing.T) { + opts := DefaultStructureOptions(DefaultConditions()) + var got StructureResult + var ok bool + var err error + switch row["mode"] { + case "hairpin": + got, ok, err = BestHairpinV2(row["seq_a"], opts) + case "cross": + got, ok, err = BestCrossDimerV2(row["seq_a"], row["seq_b"], opts) + default: + t.Fatalf("unknown structure mode %q", row["mode"]) + } + if err != nil { + t.Fatalf("structure scoring: %v", err) + } + if !ok { + t.Fatalf("expected structure candidate") + } + tol := goldenFloat(t, row, "tolerance") + if got.Kind != row["kind"] { + t.Fatalf("kind: got %q want %q", got.Kind, row["kind"]) + } + assertNearGolden(t, "tm_c", got.TmC, goldenFloat(t, row, "tm_c"), tol) + assertNearGolden(t, "dg_kcal", got.DeltaGAtAnnealKcal, goldenFloat(t, row, "dg_kcal"), tol) + if got.StemLen != goldenInt(t, row, "stem_len") || got.LoopLen != goldenInt(t, row, "loop_len") || got.BulgeCount != goldenInt(t, row, "bulge_count") || got.InternalLoopCount != goldenInt(t, row, "internal_loop_count") { + t.Fatalf("structure counts changed: got %+v row %+v", got, row) + } + }) + } +} diff --git a/core/thermo/imperfect.go b/core/thermo/imperfect.go new file mode 100644 index 0000000..f753c26 --- /dev/null +++ b/core/thermo/imperfect.go @@ -0,0 +1,515 @@ +package thermo + +import ( + "errors" + "fmt" + "math" + "strings" +) + +const ( + // MismatchPolicyPerfect identifies an all-Watson-Crick duplex. + MismatchPolicyPerfect = "nn-perfect" + + // MismatchPolicyImperfectV1 identifies the first condition-aware imperfect + // duplex model. It anchors on the perfect primer/complement NN duplex, then + // applies context-aware mismatch terms at the configured annealing conditions. + MismatchPolicyImperfectV1 = "nn-imperfect-v1" + + // MismatchPolicyImperfectTriplet identifies an imperfect-duplex result + // scored with exact triplet-level mismatch ΔΔG or ΔTm parameters. + MismatchPolicyImperfectTriplet = "nn-imperfect-v1-with-triplet-ddg" + + // MismatchPolicyImperfectCuratedPair identifies an imperfect-duplex result + // scored with the curated pair-family mismatch parameter registry. + MismatchPolicyImperfectCuratedPair = "nn-imperfect-v1-with-curated-pair-ddg" + + // MismatchPolicyImperfectHeuristicFallback identifies an imperfect-duplex + // result that had to use the current pair/context ΔΔG fallback for at least + // one mismatch because no curated triplet/pair-family parameter was available. + MismatchPolicyImperfectHeuristicFallback = "nn-imperfect-v1-with-heuristic-ddg-fallback" + + // MismatchPolicyImperfectDefaultFallback identifies an imperfect-duplex result + // that encountered an unsupported mismatch context and used the conservative + // default ΔTm fallback. + MismatchPolicyImperfectDefaultFallback = "nn-imperfect-v1-with-default-fallback" + + // EndEffectPolicyNone identifies a duplex with no explicit terminal/dangling + // correction beyond the ordinary end-window mismatch multiplier. + EndEffectPolicyNone = "none" + + // EndEffectPolicyTerminalMismatchV1 identifies the exact-terminal mismatch + // correction layer applied after ordinary 5'/3' end-window weighting. + EndEffectPolicyTerminalMismatchV1 = "nn-terminal-mismatch-v1" + + // EndEffectPolicyTemplateDanglingV1 identifies the SantaLucia-Hicks + // sequence-context model for a template base adjacent to the primer-template + // duplex. + EndEffectPolicyTemplateDanglingV1 = "nn-template-dangling-end-v1" + + // EndEffectPolicyTerminalAndDanglingV1 identifies rows where both v1 end-effect + // layers were applied. + EndEffectPolicyTerminalAndDanglingV1 = "nn-terminal-mismatch-template-dangling-v1" +) + +const ( + defaultFivePrimeMismatchWindow = 3 + defaultThreePrimeMismatchWindow = 3 + defaultFivePrimeMismatchWeight = 1.5 + defaultThreePrimeMismatchWeight = 2.0 + defaultMismatchDeltaTmC = 4.0 + defaultFivePrimeTerminalPenalty = 0.5 + defaultThreePrimeTerminalPenalty = 1.5 +) + +// ImperfectDuplexOptions controls the positional weighting used by the current +// imperfect-duplex model. Positions are primer 5'→3' indexes. +type ImperfectDuplexOptions struct { + FivePrimeWindow int + ThreePrimeWindow int + FivePrimeMultiplier float64 + ThreePrimeMultiplier float64 + DefaultMismatchDeltaTm float64 + + // Exact terminal mismatch penalties are added after the ordinary 5'/3' window + // multiplier. They are deliberately separate so diagnostics can distinguish a + // literal terminal-base mismatch from a broader end-window mismatch. + FivePrimeTerminalPenaltyC float64 + ThreePrimeTerminalPenaltyC float64 +} + +// DefaultImperfectDuplexOptions returns the positional weighting historically +// used by ipcr-thermo, with an explicit extra term for literal terminal bases. +func DefaultImperfectDuplexOptions() ImperfectDuplexOptions { + return ImperfectDuplexOptions{ + FivePrimeWindow: defaultFivePrimeMismatchWindow, + ThreePrimeWindow: defaultThreePrimeMismatchWindow, + FivePrimeMultiplier: defaultFivePrimeMismatchWeight, + ThreePrimeMultiplier: defaultThreePrimeMismatchWeight, + DefaultMismatchDeltaTm: defaultMismatchDeltaTmC, + FivePrimeTerminalPenaltyC: defaultFivePrimeTerminalPenalty, + ThreePrimeTerminalPenaltyC: defaultThreePrimeTerminalPenalty, + } +} + +func (o ImperfectDuplexOptions) withDefaults() ImperfectDuplexOptions { + d := DefaultImperfectDuplexOptions() + if o.FivePrimeWindow < 0 { + o.FivePrimeWindow = 0 + } else if o.FivePrimeWindow == 0 { + o.FivePrimeWindow = d.FivePrimeWindow + } + if o.ThreePrimeWindow < 0 { + o.ThreePrimeWindow = 0 + } else if o.ThreePrimeWindow == 0 { + o.ThreePrimeWindow = d.ThreePrimeWindow + } + if o.FivePrimeMultiplier == 0 { + o.FivePrimeMultiplier = d.FivePrimeMultiplier + } + if o.ThreePrimeMultiplier == 0 { + o.ThreePrimeMultiplier = d.ThreePrimeMultiplier + } + if o.DefaultMismatchDeltaTm == 0 { + o.DefaultMismatchDeltaTm = d.DefaultMismatchDeltaTm + } + if o.FivePrimeTerminalPenaltyC < 0 { + o.FivePrimeTerminalPenaltyC = 0 + } else if o.FivePrimeTerminalPenaltyC == 0 { + o.FivePrimeTerminalPenaltyC = d.FivePrimeTerminalPenaltyC + } + if o.ThreePrimeTerminalPenaltyC < 0 { + o.ThreePrimeTerminalPenaltyC = 0 + } else if o.ThreePrimeTerminalPenaltyC == 0 { + o.ThreePrimeTerminalPenaltyC = d.ThreePrimeTerminalPenaltyC + } + return o +} + +func (o ImperfectDuplexOptions) posMultiplier(i, n int) float64 { + o = o.withDefaults() + if o.ThreePrimeWindow > 0 && i >= n-o.ThreePrimeWindow { + return o.ThreePrimeMultiplier + } + if o.FivePrimeWindow > 0 && i < o.FivePrimeWindow { + return o.FivePrimeMultiplier + } + return 1 +} + +// MismatchContribution describes one non-Watson-Crick primer-template column. +type MismatchContribution struct { + Pos int + PrimerBase byte + TargetBase byte + P5 byte + P3 byte + T5 byte + T3 byte + Source MismatchLookupSource + ParameterSet string + Citation string + ParameterNote string + RawDeltaTmC float64 + WeightedDeltaTmC float64 + TerminalPenaltyC float64 + TerminalSource string + TerminalParameterSet string + TerminalCitation string + TerminalParameterNote string + DeltaGPenaltyKcal float64 + PositionMultiplier float64 + FivePrimeWindow bool + ThreePrimeWindow bool + FivePrimeTerminal bool + ThreePrimeTerminal bool +} + +// DanglingEndContext supplies target-strand bases adjacent to the duplex in +// primer-aligned coordinates. In PCR-product scoring, the primer 3' adjacent +// template base is usually available from the amplicon interior; the 5' outside +// flank generally is not carried in engine.Product. +type DanglingEndContext struct { + FivePrimeBase byte + ThreePrimeBase byte +} + +// DanglingEndContribution describes one table-backed template-adjacent dangling +// base correction. AdjustmentC is positive when the base stabilizes the endpoint, +// and negative for the few experimentally observed destabilizing dangling ends. +type DanglingEndContribution struct { + Side string + Base byte + TerminalPrimerBase byte + TerminalTargetBase byte + DanglingStrandSide byte + Motif string + DeltaHkcal float64 + DeltaScalK float64 + DeltaGKcal float64 + DeltaG37kcal float64 + AdjustmentC float64 + Source string + ParameterSet string + Citation string + ParameterNote string +} + +// ImperfectDuplexResult reports an approximate condition-aware imperfect +// primer-template duplex. The base nearest-neighbor terms come from the perfect +// primer/complement duplex; mismatch and end-effect terms adjust Tm, ΔG(Tanneal), +// and margin. +type ImperfectDuplexResult struct { + DuplexResult + MismatchPenaltyC float64 + DeltaGPenaltyKcal float64 + TerminalMismatchPenaltyC float64 + TerminalMismatchDeltaGKcal float64 + DanglingEndAdjustmentC float64 + DanglingEndDeltaGKcal float64 + DanglingEndCount int + MismatchCount int + FivePrimeMismatchCount int + ThreePrimeMismatchCount int + FivePrimeTerminalMismatchCount int + ThreePrimeTerminalMismatchCount int + TerminalMismatchCount int + FivePrimeTerminalMismatchPenaltyC float64 + ThreePrimeTerminalMismatchPenaltyC float64 + EndEffectPolicy string + TripletTmCount int + TripletDeltaGCount int + CuratedPairCount int + HeuristicFallbackCount int + DefaultFallbackCount int + HasNonWatsonCrick bool + UsedHeuristicAdjust bool + MismatchPolicy string + Contributions []MismatchContribution + DanglingContributions []DanglingEndContribution +} + +// ImperfectDuplex computes an imperfect primer-template duplex using default +// positional weighting. +func ImperfectDuplex(primer5to3, target3to5 string, cond Conditions) (ImperfectDuplexResult, error) { + return ImperfectDuplexWithOptions(primer5to3, target3to5, cond, DefaultImperfectDuplexOptions()) +} + +// ImperfectDuplexWithOptions computes condition-aware primer-template duplex +// quantities for equal-length A/C/G/T primers against A/C/G/T/N target sites. +// The target strand must be supplied 3'→5' in primer-aligned coordinates. +func ImperfectDuplexWithOptions(primer5to3, target3to5 string, cond Conditions, opts ImperfectDuplexOptions) (ImperfectDuplexResult, error) { + return ImperfectDuplexWithOptionsAndContext(primer5to3, target3to5, cond, opts, DanglingEndContext{}) +} + +// ImperfectDuplexWithOptionsAndContext computes condition-aware primer-template +// duplex quantities and applies optional template-adjacent dangling-end terms +// when flanking target bases are supplied. +func ImperfectDuplexWithOptionsAndContext(primer5to3, target3to5 string, cond Conditions, opts ImperfectDuplexOptions, ctx DanglingEndContext) (ImperfectDuplexResult, error) { + var out ImperfectDuplexResult + p := strings.ToUpper(strings.TrimSpace(primer5to3)) + t := strings.ToUpper(strings.TrimSpace(target3to5)) + if len(p) == 0 || len(t) == 0 || len(p) != len(t) { + return out, errors.New("ImperfectDuplex: sequences must be equal length and non-empty") + } + for i := 0; i < len(p); i++ { + if !isACGT(p[i]) { + return out, fmt.Errorf("ImperfectDuplex: non-ACGT base in primer at pos %d", i) + } + if !isNT(t[i]) { + return out, fmt.Errorf("ImperfectDuplex: unsupported target base at pos %d", i) + } + } + + perfectTarget, ok := compStrict(p) + if !ok { + return out, errors.New("ImperfectDuplex: non-ACGT base in primer") + } + base, err := PerfectDuplex(p, perfectTarget, cond) + if err != nil { + return out, err + } + + denom := math.Abs(base.EffectiveDenomCalK) + if math.IsNaN(denom) || math.IsInf(denom, 0) || denom == 0 { + denom = 200.0 + } + + n := len(p) + opts = opts.withDefaults() + penaltyC := 0.0 + for i := 0; i < n; i++ { + if wc(p[i], t[i]) { + continue + } + p5, pC, p3 := mismatchAt(p, i-1), p[i], mismatchAt(p, i+1) + t5, tC, t3 := mismatchAt(t, i-1), t[i], mismatchAt(t, i+1) + rawTm := 0.0 + source := MismatchSourceDefaultDeltaTm + deltaG := 0.0 + parameterSet := "" + citation := "" + parameterNote := "" + + if dTm, src, ok := LookupDeltaTmDetail(p5, pC, p3, t5, tC, t3); ok { + rawTm = dTm + source = src + out.TripletTmCount++ + } else if dG, src, ok := LookupDeltaGDetail(p5, pC, p3, t5, tC, t3); ok { + deltaG = dG + rawTm = DeltaGToDeltaTm(dG, denom) + source = src + if param, ok := LookupMismatchParameterInfoForContext(p5, pC, p3, t5, tC, t3); ok { + parameterSet = param.ParameterSet + citation = param.Citation + parameterNote = param.Note + } + switch src { + case MismatchSourceTripletDeltaG: + out.TripletDeltaGCount++ + case MismatchSourceCuratedPairDeltaG: + out.CuratedPairCount++ + case MismatchSourceHeuristicDeltaG: + out.HeuristicFallbackCount++ + } + } else { + rawTm = opts.DefaultMismatchDeltaTm + out.DefaultFallbackCount++ + } + + mult := opts.posMultiplier(i, n) + terminalPenalty := 0.0 + terminalSource := "" + terminalParameterSet := "" + terminalCitation := "" + terminalParameterNote := "" + if terminalKey, ok := TerminalMismatchKeyForPosition(p, t, i); ok { + if terminalParam, ok := LookupTerminalMismatchParameterWithFallback(terminalKey, opts); ok { + terminalSource = terminalParam.Source + terminalParameterSet = terminalParam.ParameterSet + terminalCitation = terminalParam.Citation + terminalParameterNote = terminalParam.Note + if terminalParam.HasDeltaTm { + terminalPenalty = terminalParam.DeltaTmC + } else if terminalParam.HasDeltaDeltaG37 { + terminalPenalty = DeltaGToDeltaTm(terminalParam.DeltaDeltaG37kcal, denom) + } + } + } + weighted := rawTm*mult + terminalPenalty + if weighted < 0 { + // Preserve the historical confidence cap: a mismatch should not make + // the imperfect duplex better than the perfect complement. + weighted = 0 + } + penaltyC += weighted + + fiveWindow := opts.FivePrimeWindow > 0 && i < opts.FivePrimeWindow + threeWindow := opts.ThreePrimeWindow > 0 && i >= n-opts.ThreePrimeWindow + if fiveWindow { + out.FivePrimeMismatchCount++ + } + if threeWindow { + out.ThreePrimeMismatchCount++ + } + if i == 0 || i == n-1 { + out.TerminalMismatchCount++ + out.TerminalMismatchPenaltyC += terminalPenalty + } + if i == 0 { + out.FivePrimeTerminalMismatchCount++ + out.FivePrimeTerminalMismatchPenaltyC += terminalPenalty + } + if i == n-1 { + out.ThreePrimeTerminalMismatchCount++ + out.ThreePrimeTerminalMismatchPenaltyC += terminalPenalty + } + out.MismatchCount++ + out.Contributions = append(out.Contributions, MismatchContribution{ + Pos: i, + PrimerBase: pC, + TargetBase: tC, + P5: p5, + P3: p3, + T5: t5, + T3: t3, + Source: source, + ParameterSet: parameterSet, + Citation: citation, + ParameterNote: parameterNote, + RawDeltaTmC: rawTm, + WeightedDeltaTmC: weighted, + TerminalPenaltyC: terminalPenalty, + TerminalSource: terminalSource, + TerminalParameterSet: terminalParameterSet, + TerminalCitation: terminalCitation, + TerminalParameterNote: terminalParameterNote, + DeltaGPenaltyKcal: deltaG*mult + terminalPenalty*denom/1000.0, + PositionMultiplier: mult, + FivePrimeWindow: fiveWindow, + ThreePrimeWindow: threeWindow, + FivePrimeTerminal: i == 0, + ThreePrimeTerminal: i == n-1, + }) + } + + adjusted := base + if penaltyC < 0 { + penaltyC = 0 + } + deltaGPenalty := penaltyC * denom / 1000.0 + danglingAdjustmentC, danglingDeltaG, dangling := danglingEndAdjustment(ctx, p, t, denom) + adjusted.TmC = base.TmC - penaltyC + danglingAdjustmentC + adjusted.AnnealMarginC = adjusted.TmC - cond.WithDefaults().AnnealC + adjusted.DeltaGAtAnnealKcal = base.DeltaGAtAnnealKcal + deltaGPenalty + danglingDeltaG + adjusted.EffectiveDenomCalK = denom + + policy := MismatchPolicyPerfect + if out.MismatchCount > 0 { + switch { + case out.DefaultFallbackCount > 0: + policy = MismatchPolicyImperfectDefaultFallback + case out.HeuristicFallbackCount > 0: + policy = MismatchPolicyImperfectHeuristicFallback + case out.TripletTmCount+out.TripletDeltaGCount > 0: + policy = MismatchPolicyImperfectTriplet + case out.CuratedPairCount > 0: + policy = MismatchPolicyImperfectCuratedPair + default: + policy = MismatchPolicyImperfectV1 + } + } + + out.DuplexResult = adjusted + out.MismatchPenaltyC = penaltyC + out.DeltaGPenaltyKcal = deltaGPenalty + out.TerminalMismatchDeltaGKcal = out.TerminalMismatchPenaltyC * denom / 1000.0 + out.DanglingEndAdjustmentC = danglingAdjustmentC + out.DanglingEndDeltaGKcal = danglingDeltaG + out.DanglingEndCount = len(dangling) + out.DanglingContributions = dangling + out.EndEffectPolicy = endEffectPolicy(out.TerminalMismatchPenaltyC > 0, len(dangling) > 0) + out.HasNonWatsonCrick = out.MismatchCount > 0 + out.UsedHeuristicAdjust = out.HeuristicFallbackCount > 0 || out.DefaultFallbackCount > 0 + out.MismatchPolicy = policy + return out, nil +} + +func endEffectPolicy(hasTerminalMismatch, hasDangling bool) string { + switch { + case hasTerminalMismatch && hasDangling: + return EndEffectPolicyTerminalAndDanglingV1 + case hasDangling: + return EndEffectPolicyTemplateDanglingV1 + case hasTerminalMismatch: + return EndEffectPolicyTerminalMismatchV1 + default: + return EndEffectPolicyNone + } +} + +func danglingEndAdjustment(ctx DanglingEndContext, primer, target string, denom float64) (float64, float64, []DanglingEndContribution) { + if denom <= 0 || math.IsNaN(denom) || math.IsInf(denom, 0) || len(primer) == 0 || len(target) == 0 { + return 0, 0, nil + } + contribs := make([]DanglingEndContribution, 0, 2) + add := func(side string, base, terminalPrimer, terminalTarget byte) { + param, ok := LookupTemplateDanglingEndParameter(side, base, terminalPrimer, terminalTarget) + if !ok { + return + } + dg := param.DeltaG37kcal + adjC := -dg * 1000.0 / denom + contribs = append(contribs, DanglingEndContribution{ + Side: side, + DanglingStrandSide: param.Key.StrandEnd, + Base: param.Key.DanglingBase, + TerminalPrimerBase: param.Key.OppositeBase, + TerminalTargetBase: param.Key.PairedBase, + Motif: param.Motif, + DeltaHkcal: param.DeltaHkcal, + DeltaScalK: param.DeltaScalK, + DeltaGKcal: dg, + DeltaG37kcal: dg, + AdjustmentC: adjC, + Source: param.Source, + ParameterSet: param.ParameterSet, + Citation: param.Citation, + ParameterNote: param.Note, + }) + } + // Target/template 3' dangling bases sit beside the primer 5' end. + add("primer-5p", ctx.FivePrimeBase, primer[0], target[0]) + // Target/template 5' dangling bases sit beside the primer 3' end. + add("primer-3p", ctx.ThreePrimeBase, primer[len(primer)-1], target[len(target)-1]) + + adjustmentC := 0.0 + deltaG := 0.0 + for _, c := range contribs { + adjustmentC += c.AdjustmentC + deltaG += c.DeltaGKcal + } + return adjustmentC, deltaG, contribs +} + +func normalizeBase(b byte) byte { + switch b { + case 'a', 'A': + return 'A' + case 'c', 'C': + return 'C' + case 'g', 'G': + return 'G' + case 't', 'T': + return 'T' + default: + return 'N' + } +} + +func mismatchAt(s string, idx int) byte { + if idx < 0 || idx >= len(s) { + return 'N' + } + return s[idx] +} diff --git a/core/thermo/imperfect_test.go b/core/thermo/imperfect_test.go new file mode 100644 index 0000000..e881e0e --- /dev/null +++ b/core/thermo/imperfect_test.go @@ -0,0 +1,338 @@ +package thermo + +import ( + "math" + "testing" +) + +func TestImperfectDuplexPerfectMatchesPerfectDuplex(t *testing.T) { + primer := "ACGTACGTACGTACGT" + cond := DefaultConditions() + perfect, err := PerfectDuplex(primer, comp(primer), cond) + if err != nil { + t.Fatalf("PerfectDuplex: %v", err) + } + got, err := ImperfectDuplex(primer, comp(primer), cond) + if err != nil { + t.Fatalf("ImperfectDuplex: %v", err) + } + if got.MismatchCount != 0 || got.MismatchPenaltyC != 0 || got.MismatchPolicy != MismatchPolicyPerfect { + t.Fatalf("expected perfect mismatch summary, got %+v", got) + } + if math.Abs(got.TmC-perfect.TmC) > 1e-9 || math.Abs(got.DeltaGAtAnnealKcal-perfect.DeltaGAtAnnealKcal) > 1e-9 { + t.Fatalf("perfect/imperfect mismatch: perfect=%+v got=%+v", perfect, got.DuplexResult) + } +} + +func TestImperfectDuplexMismatchLowersTmAndReportsTripletMetadata(t *testing.T) { + primer := "ACGTACGTACGTACGT" + target := []byte(comp(primer)) + target[6] = 'A' + if target[6] == comp(primer)[6] { + target[6] = 'C' + } + + got, err := ImperfectDuplex(primer, string(target), DefaultConditions()) + if err != nil { + t.Fatalf("ImperfectDuplex: %v", err) + } + if got.MismatchCount != 1 || !got.HasNonWatsonCrick { + t.Fatalf("expected one mismatch, got %+v", got) + } + if got.MismatchPenaltyC <= 0 || got.DeltaGPenaltyKcal <= 0 { + t.Fatalf("expected positive mismatch penalty, got %+v", got) + } + if got.TripletDeltaGCount != 1 || got.CuratedPairCount != 0 || got.HeuristicFallbackCount != 0 || got.UsedHeuristicAdjust { + t.Fatalf("expected exact triplet metadata without heuristic fallback, got %+v", got) + } + if got.MismatchPolicy != MismatchPolicyImperfectTriplet { + t.Fatalf("expected triplet mismatch policy, got %+v", got) + } + if len(got.Contributions) != 1 { + t.Fatalf("expected one mismatch contribution, got %+v", got.Contributions) + } + contrib := got.Contributions[0] + if contrib.ParameterSet != MismatchParameterSetSantaLuciaHicks2004CompiledDimerGaugeV1 { + t.Fatalf("expected triplet parameter set, got %+v", contrib) + } + if contrib.Citation == "" || contrib.ParameterNote == "" { + t.Fatalf("expected citation and curation note, got %+v", contrib) + } + base, err := PerfectDuplex(primer, comp(primer), DefaultConditions()) + if err != nil { + t.Fatalf("PerfectDuplex: %v", err) + } + if !(got.TmC < base.TmC && got.AnnealMarginC < base.AnnealMarginC) { + t.Fatalf("expected mismatch to lower Tm/margin: base=%+v got=%+v", base, got.DuplexResult) + } +} + +func TestImperfectDuplexThreePrimeMismatchIsWeightedMoreStrongly(t *testing.T) { + primer := "GGGGGGGGGG" + perfect := []byte(comp(primer)) + + makeMismatch := func(pos int) string { + t := append([]byte(nil), perfect...) + t[pos] = 'A' // G·A mismatch at each position + return string(t) + } + internal, err := ImperfectDuplex(primer, makeMismatch(4), DefaultConditions()) + if err != nil { + t.Fatalf("internal mismatch: %v", err) + } + five, err := ImperfectDuplex(primer, makeMismatch(0), DefaultConditions()) + if err != nil { + t.Fatalf("5' mismatch: %v", err) + } + three, err := ImperfectDuplex(primer, makeMismatch(len(primer)-1), DefaultConditions()) + if err != nil { + t.Fatalf("3' mismatch: %v", err) + } + if !(three.MismatchPenaltyC > five.MismatchPenaltyC && internal.MismatchPenaltyC > 0) { + t.Fatalf("unexpected position penalties, got 3'=%g 5'=%g internal=%g", three.MismatchPenaltyC, five.MismatchPenaltyC, internal.MismatchPenaltyC) + } + if three.ThreePrimeMismatchCount != 1 || five.FivePrimeMismatchCount != 1 { + t.Fatalf("terminal window counts missing: 3'=%+v 5'=%+v", three, five) + } +} + +func TestImperfectDuplexReportsExplicitTerminalMismatchPenalty(t *testing.T) { + primer := "GGGGGGGGGG" + perfect := []byte(comp(primer)) + terminalTarget := append([]byte(nil), perfect...) + terminalTarget[len(terminalTarget)-1] = 'A' + + got, err := ImperfectDuplex(primer, string(terminalTarget), DefaultConditions()) + if err != nil { + t.Fatalf("ImperfectDuplex: %v", err) + } + if got.ThreePrimeTerminalMismatchCount != 1 || got.TerminalMismatchCount != 1 { + t.Fatalf("expected one explicit 3' terminal mismatch, got %+v", got) + } + if got.ThreePrimeTerminalMismatchPenaltyC <= 0 || got.TerminalMismatchDeltaGKcal <= 0 { + t.Fatalf("expected explicit terminal penalty terms, got %+v", got) + } + if got.EndEffectPolicy != EndEffectPolicyTerminalMismatchV1 { + t.Fatalf("unexpected end-effect policy: %q", got.EndEffectPolicy) + } + if len(got.Contributions) != 1 || !got.Contributions[0].ThreePrimeTerminal || got.Contributions[0].TerminalPenaltyC <= 0 { + t.Fatalf("expected contribution-level terminal annotation, got %+v", got.Contributions) + } + contrib := got.Contributions[0] + if contrib.TerminalSource != TerminalMismatchSourceHeuristicPenalty || contrib.TerminalParameterSet != TerminalMismatchParameterSetHeuristicV1 { + t.Fatalf("expected named terminal mismatch fallback provenance, got %+v", contrib) + } + if contrib.TerminalCitation == "" || contrib.TerminalParameterNote == "" { + t.Fatalf("expected terminal mismatch citation and note, got %+v", contrib) + } + + internalTarget := append([]byte(nil), perfect...) + internalTarget[4] = 'A' + internal, err := ImperfectDuplex(primer, string(internalTarget), DefaultConditions()) + if err != nil { + t.Fatalf("internal ImperfectDuplex: %v", err) + } + if internal.TerminalMismatchPenaltyC != 0 || internal.TerminalMismatchCount != 0 { + t.Fatalf("internal mismatch should not report terminal terms, got %+v", internal) + } +} + +func TestLookupTerminalMismatchHeuristicParameter(t *testing.T) { + key := TerminalMismatchKey{ + PrimerEnd: TerminalMismatchPrimer3Prime, + P: 'G', + T: 'A', + PNeighbor: 'G', + TNeighbor: 'C', + } + param, ok := LookupTerminalMismatchParameterWithFallback(key, DefaultImperfectDuplexOptions()) + if !ok { + t.Fatalf("expected terminal mismatch fallback parameter for %+v", key) + } + if param.Source != TerminalMismatchSourceHeuristicPenalty || param.ParameterSet != TerminalMismatchParameterSetHeuristicV1 { + t.Fatalf("unexpected terminal mismatch provenance: %+v", param) + } + if !param.HasDeltaTm || param.DeltaTmC != defaultThreePrimeTerminalPenalty || param.HasDeltaDeltaG37 { + t.Fatalf("unexpected terminal mismatch values: %+v", param) + } + if param.Citation == "" || param.Note == "" { + t.Fatalf("expected terminal mismatch citation/note: %+v", param) + } + + if _, ok := LookupTerminalMismatchParameter(key); ok { + t.Fatalf("did not expect a curated terminal mismatch table entry before literature-backed values are added") + } + + perfect := TerminalMismatchKey{PrimerEnd: TerminalMismatchPrimer3Prime, P: 'G', T: 'C', PNeighbor: 'G', TNeighbor: 'C'} + if _, ok := LookupTerminalMismatchParameterWithFallback(perfect, DefaultImperfectDuplexOptions()); ok { + t.Fatalf("Watson-Crick terminal pair should not produce a terminal mismatch parameter") + } + + unknownTarget := TerminalMismatchKey{PrimerEnd: TerminalMismatchPrimer5Prime, P: 'A', T: 'N', PNeighbor: 'C', TNeighbor: 'G'} + param, ok = LookupTerminalMismatchParameterWithFallback(unknownTarget, DefaultImperfectDuplexOptions()) + if !ok || param.DeltaTmC != defaultFivePrimeTerminalPenalty { + t.Fatalf("expected heuristic terminal fallback for target N, ok=%v param=%+v", ok, param) + } +} + +func TestTerminalMismatchKeyForPosition(t *testing.T) { + key, ok := TerminalMismatchKeyForPosition("ACGT", "TGCG", 3) + if !ok { + t.Fatal("expected 3' terminal mismatch key") + } + want := TerminalMismatchKey{PrimerEnd: TerminalMismatchPrimer3Prime, P: 'T', T: 'G', PNeighbor: 'G', TNeighbor: 'C'} + if key != want { + t.Fatalf("unexpected terminal key: got %+v want %+v", key, want) + } + + if _, ok := TerminalMismatchKeyForPosition("ACGT", "TGGA", 2); ok { + t.Fatal("internal mismatch/key position should not produce a terminal mismatch key") + } + if _, ok := TerminalMismatchKeyForPosition("ACGT", "TGCA", 3); ok { + t.Fatal("Watson-Crick terminal pair should not produce a terminal mismatch key") + } +} + +func TestImperfectDuplexDanglingEndContextRaisesMargin(t *testing.T) { + primer := "ACGTACGTACGTACGT" + target := comp(primer) + cond := DefaultConditions() + base, err := ImperfectDuplex(primer, target, cond) + if err != nil { + t.Fatalf("base ImperfectDuplex: %v", err) + } + got, err := ImperfectDuplexWithOptionsAndContext( + primer, + target, + cond, + DefaultImperfectDuplexOptions(), + DanglingEndContext{ThreePrimeBase: 'A'}, + ) + if err != nil { + t.Fatalf("dangling ImperfectDuplex: %v", err) + } + if got.DanglingEndCount != 1 || got.DanglingEndAdjustmentC <= 0 || got.DanglingEndDeltaGKcal >= 0 { + t.Fatalf("expected favorable dangling-end adjustment, got %+v", got) + } + if len(got.DanglingContributions) != 1 { + t.Fatalf("expected one dangling-end contribution, got %+v", got.DanglingContributions) + } + c := got.DanglingContributions[0] + if c.ParameterSet != DanglingEndParameterSetSantaLuciaHicks2004V1 || c.Source == "" || c.Citation == "" { + t.Fatalf("expected SantaLucia-Hicks dangling-end provenance, got %+v", c) + } + if c.Side != "primer-3p" || c.DanglingStrandSide != DanglingEndStrand5Prime || c.Base != 'A' || c.TerminalPrimerBase != 'T' || c.TerminalTargetBase != 'A' { + t.Fatalf("unexpected dangling-end key mapping: %+v", c) + } + if math.Abs(c.DeltaG37kcal-(-0.51)) > 1e-12 || math.Abs(c.DeltaHkcal-0.2) > 1e-12 { + t.Fatalf("unexpected dangling-end thermodynamics: %+v", c) + } + if got.EndEffectPolicy != EndEffectPolicyTemplateDanglingV1 { + t.Fatalf("unexpected end-effect policy: %q", got.EndEffectPolicy) + } + if !(got.AnnealMarginC > base.AnnealMarginC && got.DeltaGAtAnnealKcal < base.DeltaGAtAnnealKcal) { + t.Fatalf("expected dangling context to stabilize endpoint: base=%+v got=%+v", base.DuplexResult, got.DuplexResult) + } +} + +func TestImperfectDuplexDanglingEndCanBeDestabilizing(t *testing.T) { + primer := "ACGTACGTACGTACGA" + target := comp(primer) + cond := DefaultConditions() + base, err := ImperfectDuplex(primer, target, cond) + if err != nil { + t.Fatalf("base ImperfectDuplex: %v", err) + } + got, err := ImperfectDuplexWithOptionsAndContext( + primer, + target, + cond, + DefaultImperfectDuplexOptions(), + DanglingEndContext{ThreePrimeBase: 'G'}, + ) + if err != nil { + t.Fatalf("dangling ImperfectDuplex: %v", err) + } + if got.DanglingEndCount != 1 || len(got.DanglingContributions) != 1 { + t.Fatalf("expected one dangling-end contribution, got %+v", got) + } + c := got.DanglingContributions[0] + if c.Side != "primer-3p" || c.DanglingStrandSide != DanglingEndStrand5Prime || c.Base != 'G' || c.TerminalPrimerBase != 'A' || c.TerminalTargetBase != 'T' { + t.Fatalf("unexpected dangling-end key mapping: %+v", c) + } + if math.Abs(c.DeltaG37kcal-0.48) > 1e-12 || c.AdjustmentC >= 0 || got.DanglingEndDeltaGKcal <= 0 { + t.Fatalf("expected destabilizing dangling-end term, got contribution=%+v result=%+v", c, got) + } + if !(got.AnnealMarginC < base.AnnealMarginC && got.DeltaGAtAnnealKcal > base.DeltaGAtAnnealKcal) { + t.Fatalf("expected dangling context to destabilize endpoint: base=%+v got=%+v", base.DuplexResult, got.DuplexResult) + } +} + +func TestLookupDanglingEndParameterSantaLuciaHicksTable3(t *testing.T) { + if len(CuratedDanglingEnds) != 32 { + t.Fatalf("curated dangling-end count: got %d want 32", len(CuratedDanglingEnds)) + } + cases := []struct { + name string + key DanglingEndKey + wantDH float64 + wantDG37 float64 + wantMotif string + mappedSide string + }{ + { + name: "5p_XT_A_positive", + key: DanglingEndKey{StrandEnd: DanglingEndStrand5Prime, DanglingBase: 'G', PairedBase: 'T', OppositeBase: 'A'}, + wantDH: -4.2, + wantDG37: 0.48, + wantMotif: "GT/A", + }, + { + name: "3p_AX_T_positive", + key: DanglingEndKey{StrandEnd: DanglingEndStrand3Prime, DanglingBase: 'C', PairedBase: 'A', OppositeBase: 'T'}, + wantDH: 4.7, + wantDG37: 0.28, + wantMotif: "AC/T", + }, + { + name: "3p_GX_C_strong", + key: DanglingEndKey{StrandEnd: DanglingEndStrand3Prime, DanglingBase: 'A', PairedBase: 'G', OppositeBase: 'C'}, + wantDH: -2.1, + wantDG37: -0.92, + wantMotif: "GA/C", + }, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + got, ok := LookupDanglingEndParameter(tc.key) + if !ok { + t.Fatalf("missing dangling-end parameter for %+v", tc.key) + } + if got.ParameterSet != DanglingEndParameterSetSantaLuciaHicks2004V1 || got.Source == "" || got.Citation == "" { + t.Fatalf("missing dangling-end provenance: %+v", got) + } + if got.Motif != tc.wantMotif || math.Abs(got.DeltaHkcal-tc.wantDH) > 1e-12 || math.Abs(got.DeltaG37kcal-tc.wantDG37) > 1e-12 { + t.Fatalf("unexpected parameter: got %+v want DH=%g DG=%g motif=%s", got, tc.wantDH, tc.wantDG37, tc.wantMotif) + } + }) + } + + mapped, ok := LookupTemplateDanglingEndParameter("primer-3p", 'G', 'A', 'T') + if !ok || mapped.Key.StrandEnd != DanglingEndStrand5Prime || mapped.Motif != "GT/A" || math.Abs(mapped.DeltaG37kcal-0.48) > 1e-12 { + t.Fatalf("unexpected primer-3p target-dangling mapping: ok=%v param=%+v", ok, mapped) + } + mapped, ok = LookupTemplateDanglingEndParameter("primer-5p", 'C', 'T', 'A') + if !ok || mapped.Key.StrandEnd != DanglingEndStrand3Prime || mapped.Motif != "AC/T" || math.Abs(mapped.DeltaG37kcal-0.28) > 1e-12 { + t.Fatalf("unexpected primer-5p target-dangling mapping: ok=%v param=%+v", ok, mapped) + } +} + +func TestLookupMismatchParameterInfoCuratedPair(t *testing.T) { + param, ok := LookupMismatchParameterInfo(broadMismatchKey('G', 'T')) + if !ok { + t.Fatal("expected curated G/T pair-family parameter") + } + if param.Source != MismatchSourceCuratedPairDeltaG || param.ParameterSet != MismatchParameterSetPairFamilyV1 || param.DeltaDeltaGKcal <= 0 { + t.Fatalf("unexpected curated parameter: %+v", param) + } +} diff --git a/core/thermo/iupac.go b/core/thermo/iupac.go new file mode 100644 index 0000000..537f6e0 --- /dev/null +++ b/core/thermo/iupac.go @@ -0,0 +1,108 @@ +package thermo + +import ( + "fmt" + "strings" +) + +const ( + IUPACThermoPolicyStrict = "strict" + IUPACThermoPolicyWorst = "worst" + IUPACThermoPolicyBest = "best" + IUPACThermoPolicyMean = "mean" + IUPACThermoPolicyEnumerate = "enumerate" +) + +func ParseIUPACThermoPolicy(s string) (string, error) { + s = strings.ToLower(strings.TrimSpace(s)) + if s == "" { + return IUPACThermoPolicyWorst, nil + } + switch s { + case IUPACThermoPolicyStrict, IUPACThermoPolicyWorst, IUPACThermoPolicyBest, IUPACThermoPolicyMean, IUPACThermoPolicyEnumerate: + return s, nil + default: + return "", fmt.Errorf("unknown IUPAC thermo policy %q (expected strict | worst | best | mean | enumerate)", s) + } +} + +func IsStrictACGT(s string) bool { + if s == "" { + return false + } + for i := 0; i < len(s); i++ { + switch s[i] { + case 'A', 'C', 'G', 'T', 'a', 'c', 'g', 't': + default: + return false + } + } + return true +} + +func ExpandIUPAC(seq string, max int) ([]string, bool, error) { + seq = strings.ToUpper(strings.TrimSpace(seq)) + if seq == "" { + return nil, false, fmt.Errorf("empty IUPAC sequence") + } + out := []string{""} + capped := false + for i := 0; i < len(seq); i++ { + bases, ok := iupacBases(seq[i]) + if !ok { + return nil, false, fmt.Errorf("unsupported IUPAC base %q at position %d", seq[i], i) + } + next := make([]string, 0, len(out)*len(bases)) + for _, prefix := range out { + for _, b := range bases { + if max > 0 && len(next) >= max { + capped = true + break + } + next = append(next, prefix+string(b)) + } + if capped && max > 0 && len(next) >= max { + break + } + } + out = next + } + return out, capped, nil +} + +func iupacBases(b byte) ([]byte, bool) { + switch b { + case 'A': + return []byte{'A'}, true + case 'C': + return []byte{'C'}, true + case 'G': + return []byte{'G'}, true + case 'T': + return []byte{'T'}, true + case 'R': + return []byte{'A', 'G'}, true + case 'Y': + return []byte{'C', 'T'}, true + case 'S': + return []byte{'C', 'G'}, true + case 'W': + return []byte{'A', 'T'}, true + case 'K': + return []byte{'G', 'T'}, true + case 'M': + return []byte{'A', 'C'}, true + case 'B': + return []byte{'C', 'G', 'T'}, true + case 'D': + return []byte{'A', 'G', 'T'}, true + case 'H': + return []byte{'A', 'C', 'T'}, true + case 'V': + return []byte{'A', 'C', 'G'}, true + case 'N': + return []byte{'A', 'C', 'G', 'T'}, true + default: + return nil, false + } +} diff --git a/core/thermo/iupac_test.go b/core/thermo/iupac_test.go new file mode 100644 index 0000000..5b69023 --- /dev/null +++ b/core/thermo/iupac_test.go @@ -0,0 +1,33 @@ +package thermo + +import "testing" + +func TestExpandIUPACStrictAndDegenerate(t *testing.T) { + got, capped, err := ExpandIUPAC("AR", 0) + if err != nil { + t.Fatalf("ExpandIUPAC: %v", err) + } + if capped || len(got) != 2 || got[0] != "AA" || got[1] != "AG" { + t.Fatalf("unexpected expansion: got=%v capped=%v", got, capped) + } +} + +func TestExpandIUPACCap(t *testing.T) { + got, capped, err := ExpandIUPAC("NNN", 5) + if err != nil { + t.Fatalf("ExpandIUPAC: %v", err) + } + if !capped || len(got) != 5 { + t.Fatalf("expected capped 5 expansions, got len=%d capped=%v", len(got), capped) + } +} + +func TestParseIUPACThermoPolicyDefaultWorst(t *testing.T) { + got, err := ParseIUPACThermoPolicy("") + if err != nil { + t.Fatalf("ParseIUPACThermoPolicy: %v", err) + } + if got != IUPACThermoPolicyWorst { + t.Fatalf("got %q, want %q", got, IUPACThermoPolicyWorst) + } +} diff --git a/core/thermo/mismatch.go b/core/thermo/mismatch.go index 2173bdd..c9f61f1 100644 --- a/core/thermo/mismatch.go +++ b/core/thermo/mismatch.go @@ -26,6 +26,17 @@ package thermo // - When triplet overrides are missing, we apply a pair‑chemistry fallback // with lightweight context adjustments (see LookupDeltaG). // + +// MismatchLookupSource labels where an individual mismatch term came from. +type MismatchLookupSource string + +const ( + MismatchSourceTripletDeltaTm MismatchLookupSource = "triplet-dtm" + MismatchSourceTripletDeltaG MismatchLookupSource = "triplet-ddg" + MismatchSourceHeuristicDeltaG MismatchLookupSource = "heuristic-ddg-fallback" + MismatchSourceDefaultDeltaTm MismatchLookupSource = "default-dtm-fallback" +) + // Triplet context (primer and target bases are given as runes A/C/G/T; target is 3'→5'). type MismatchKey struct { P5, P, P3 byte // primer context @@ -73,31 +84,43 @@ var pairDeltaG = map[[2]byte]float64{ // It ONLY returns true when a triplet override is available. If not, return false // so callers can fall back to ΔΔG via LookupDeltaG + DeltaGToDeltaTm. func LookupDeltaTm(p5, p, p3, t5, t, t3 byte) (float64, bool) { + d, _, ok := LookupDeltaTmDetail(p5, p, p3, t5, t, t3) + return d, ok +} + +// LookupDeltaTmDetail returns ΔTm plus a source label. It only succeeds for +// curated triplet-level ΔTm entries. +func LookupDeltaTmDetail(p5, p, p3, t5, t, t3 byte) (float64, MismatchLookupSource, bool) { if !isACGT(p) || !isNT(t) { - return 0, false + return 0, "", false } // 1) exact triplet override if available - if isACGT(p5) && isACGT(p3) && isNT(t5) && isNT(t3) { - if d, ok := DeltaTmTriplet[MismatchKey{P5: p5, P: p, P3: p3, T5: t5, T: t, T3: t3}]; ok { - return d, true - } + if d, ok := lookupDeltaTmTripletOverride(p5, p, p3, t5, t, t3); ok { + return d, MismatchSourceTripletDeltaTm, true } // No pair fallback — let caller use ΔΔG. - return 0, false + return 0, "", false } // LookupDeltaG returns ΔΔG (kcal/mol) using precedence: // 1) triplet context if available, else 2) pair-only with context tweaks. // A negative value is allowed (rare stabilizing contexts). func LookupDeltaG(p5, p, p3, t5, t, t3 byte) (float64, bool) { + dg, _, ok := LookupDeltaGDetail(p5, p, p3, t5, t, t3) + return dg, ok +} + +// LookupDeltaGDetail returns ΔΔG plus the source label used for diagnostics. +func LookupDeltaGDetail(p5, p, p3, t5, t, t3 byte) (float64, MismatchLookupSource, bool) { if !isACGT(p) || !isNT(t) { - return 0, false + return 0, "", false } - // 1) exact triplet override - if isACGT(p5) && isACGT(p3) && isNT(t5) && isNT(t3) { - if dg, ok := DeltaGTriplet[MismatchKey{P5: p5, P: p, P3: p3, T5: t5, T: t, T3: t3}]; ok { - return dg, true - } + // 1) exact or wildcard triplet override + if dg, src, ok := lookupDeltaGTripletOverride(p5, p, p3, t5, t, t3); ok { + return dg, src, true + } + if param, ok := lookupCuratedDeltaGTriplet(p5, p, p3, t5, t, t3); ok { + return param.DeltaDeltaGKcal, param.Source, true } // 2) pair-only baseline @@ -152,7 +175,7 @@ func LookupDeltaG(p5, p, p3, t5, t, t3 byte) (float64, bool) { if base < -0.10 { base = -0.10 } - return base, true + return base, MismatchSourceHeuristicDeltaG, true } // DeltaGToDeltaTm converts ΔΔG (kcal/mol) to ΔTm (°C) using an effective denominator D (cal/K/mol): diff --git a/core/thermo/mismatch_params.go b/core/thermo/mismatch_params.go new file mode 100644 index 0000000..69b7de3 --- /dev/null +++ b/core/thermo/mismatch_params.go @@ -0,0 +1,185 @@ +package thermo + +// Curated mismatch parameter registry. +// +// This v1 table intentionally starts with broad pair-family entries rather than +// pretending to be a complete nearest-neighbor mismatch parameterization. Exact +// triplet overrides can still be added to DeltaTmTriplet/DeltaGTriplet. The +// broad entries move ordinary A/C/G/T mismatches out of the generic fallback +// path and keep source metadata explicit. +const ( + MismatchSourceCuratedPairDeltaG MismatchLookupSource = "curated-pair-ddg-v1" + + MismatchParameterSetPairFamilyV1 = "ipcr-pair-family-v1" +) + +type MismatchParameterInfo struct { + DeltaDeltaGKcal float64 + Source MismatchLookupSource + ParameterSet string + Citation string + Note string +} + +// DeltaGTripletSource optionally overrides the source label for user/populated +// DeltaGTriplet entries. If absent, an entry in DeltaGTriplet is treated as an +// exact triplet ΔΔG parameter. +var DeltaGTripletSource = map[MismatchKey]MismatchLookupSource{} + +// DeltaGTripletParameterSet optionally labels user/populated DeltaGTriplet +// entries for diagnostics or tests. +var DeltaGTripletParameterSet = map[MismatchKey]string{} + +// DeltaGTripletCitation optionally records the source citation for triplet +// ΔΔG entries. It is surfaced in JSON/JSONL and --thermo-details output for +// auditability when a curated table entry is used. +var DeltaGTripletCitation = map[MismatchKey]string{} + +// DeltaGTripletNote optionally records a human-readable curation note for a +// triplet ΔΔG entry. +var DeltaGTripletNote = map[MismatchKey]string{} + +// curatedDeltaGTriplet contains broad pair-family ΔΔG entries keyed with N +// wildcards in the flanking positions. Values are deliberately conservative and +// preserve the historical ipcr ordering while moving common A/C/G/T mismatches +// out of the heuristic-fallback path. +var curatedDeltaGTriplet = map[MismatchKey]MismatchParameterInfo{ + broadMismatchKey('G', 'T'): {0.60, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "G/T wobble pair-family default"}, + broadMismatchKey('T', 'G'): {0.60, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "T/G wobble pair-family default"}, + + broadMismatchKey('A', 'G'): {0.85, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "A/G transition pair-family default"}, + broadMismatchKey('G', 'A'): {0.85, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "G/A transition pair-family default"}, + broadMismatchKey('C', 'T'): {0.85, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "C/T transition pair-family default"}, + broadMismatchKey('T', 'C'): {0.85, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "T/C transition pair-family default"}, + + broadMismatchKey('A', 'C'): {1.10, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "A/C transversion pair-family default"}, + broadMismatchKey('C', 'A'): {1.10, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "C/A transversion pair-family default"}, + broadMismatchKey('A', 'T'): {1.20, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "A/T transversion pair-family default"}, + broadMismatchKey('T', 'A'): {1.20, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "T/A transversion pair-family default"}, + broadMismatchKey('C', 'G'): {1.20, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "C/G transversion pair-family default"}, + broadMismatchKey('G', 'C'): {1.20, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "G/C transversion pair-family default"}, + + broadMismatchKey('A', 'A'): {1.40, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "A/A like-with-like pair-family default"}, + broadMismatchKey('T', 'T'): {1.40, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "T/T like-with-like pair-family default"}, + broadMismatchKey('G', 'G'): {1.40, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "G/G like-with-like pair-family default"}, + broadMismatchKey('C', 'C'): {1.40, MismatchSourceCuratedPairDeltaG, MismatchParameterSetPairFamilyV1, "", "C/C like-with-like pair-family default"}, +} + +func broadMismatchKey(p, t byte) MismatchKey { + return MismatchKey{P5: 'N', P: p, P3: 'N', T5: 'N', T: t, T3: 'N'} +} + +func mismatchCandidateKeys(p5, p, p3, t5, t, t3 byte) []MismatchKey { + keys := []MismatchKey{ + {P5: mismatchFlank(p5), P: p, P3: mismatchFlank(p3), T5: mismatchFlank(t5), T: t, T3: mismatchFlank(t3)}, + {P5: mismatchFlank(p5), P: p, P3: mismatchFlank(p3), T5: 'N', T: t, T3: 'N'}, + {P5: 'N', P: p, P3: 'N', T5: mismatchFlank(t5), T: t, T3: mismatchFlank(t3)}, + broadMismatchKey(p, t), + } + out := make([]MismatchKey, 0, len(keys)) + seen := map[MismatchKey]struct{}{} + for _, k := range keys { + if _, ok := seen[k]; ok { + continue + } + seen[k] = struct{}{} + out = append(out, k) + } + return out +} + +func mismatchFlank(b byte) byte { + if isACGT(b) { + return b + } + return 'N' +} + +func lookupDeltaTmTripletOverride(p5, p, p3, t5, t, t3 byte) (float64, bool) { + for _, key := range mismatchCandidateKeys(p5, p, p3, t5, t, t3) { + if d, ok := DeltaTmTriplet[key]; ok { + return d, true + } + } + return 0, false +} + +func lookupDeltaGTripletInfo(p5, p, p3, t5, t, t3 byte) (MismatchParameterInfo, bool) { + for _, key := range mismatchCandidateKeys(p5, p, p3, t5, t, t3) { + if dg, ok := DeltaGTriplet[key]; ok { + set := DeltaGTripletParameterSet[key] + if set == "" { + set = "user-triplet-ddg" + } + src := DeltaGTripletSource[key] + if src == "" { + src = MismatchSourceTripletDeltaG + } + return MismatchParameterInfo{ + DeltaDeltaGKcal: dg, + Source: src, + ParameterSet: set, + Citation: DeltaGTripletCitation[key], + Note: DeltaGTripletNote[key], + }, true + } + } + return MismatchParameterInfo{}, false +} + +func lookupDeltaGTripletOverride(p5, p, p3, t5, t, t3 byte) (float64, MismatchLookupSource, bool) { + param, ok := lookupDeltaGTripletInfo(p5, p, p3, t5, t, t3) + if !ok { + return 0, "", false + } + return param.DeltaDeltaGKcal, param.Source, true +} + +func lookupCuratedDeltaGTriplet(p5, p, p3, t5, t, t3 byte) (MismatchParameterInfo, bool) { + for _, key := range mismatchCandidateKeys(p5, p, p3, t5, t, t3) { + if param, ok := curatedDeltaGTriplet[key]; ok { + return param, true + } + } + return MismatchParameterInfo{}, false +} + +// LookupMismatchParameterInfo exposes metadata for an exact or wildcard +// mismatch key. It is used by tests and report writers. +func LookupMismatchParameterInfo(key MismatchKey) (MismatchParameterInfo, bool) { + if param, ok := curatedDeltaGTriplet[key]; ok { + return param, true + } + if dg, ok := DeltaGTriplet[key]; ok { + set := DeltaGTripletParameterSet[key] + if set == "" { + set = "user-triplet-ddg" + } + src := DeltaGTripletSource[key] + if src == "" { + src = MismatchSourceTripletDeltaG + } + return MismatchParameterInfo{ + DeltaDeltaGKcal: dg, + Source: src, + ParameterSet: set, + Citation: DeltaGTripletCitation[key], + Note: DeltaGTripletNote[key], + }, true + } + return MismatchParameterInfo{}, false +} + +// LookupMismatchParameterInfoForContext returns the metadata that would be used +// by LookupDeltaGDetail for the supplied local mismatch context. It follows the +// same exact/wildcard precedence but intentionally does not synthesize metadata +// for heuristic or default fallbacks. +func LookupMismatchParameterInfoForContext(p5, p, p3, t5, t, t3 byte) (MismatchParameterInfo, bool) { + if param, ok := lookupDeltaGTripletInfo(p5, p, p3, t5, t, t3); ok { + return param, true + } + if param, ok := lookupCuratedDeltaGTriplet(p5, p, p3, t5, t, t3); ok { + return param, true + } + return MismatchParameterInfo{}, false +} diff --git a/core/thermo/mismatch_triplet_params.go b/core/thermo/mismatch_triplet_params.go new file mode 100644 index 0000000..a075c9e --- /dev/null +++ b/core/thermo/mismatch_triplet_params.go @@ -0,0 +1,271 @@ +package thermo + +import "math" + +const MismatchParameterSetSantaLuciaHicks2004CompiledDimerGaugeV1 = "santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1" + +type MismatchTripletParameter struct { + Key MismatchKey + DeltaHkcal float64 + DeltaScalK float64 + DeltaG37kcal float64 + PerfectRefG37kcal float64 + DeltaDeltaG37kcal float64 + Source MismatchLookupSource + ParameterSet string + Citation string + Note string +} + +const santaLuciaHicksInternalMismatchCompiledGaugeCitation = "SantaLucia & Hicks 2004 Table 2 + unified Watson-Crick Table 1; primary mismatch papers: Allawi & SantaLucia 1997/1998 and Peyret et al. 1999" + +func santaLuciaHicksCompiledGaugeTriplet(p5, p, p3, t5, t, t3 byte, g37, refG37, ddg37 float64, note string) MismatchTripletParameter { + return MismatchTripletParameter{ + Key: MismatchKey{P5: p5, P: p, P3: p3, T5: t5, T: t, T3: t3}, + DeltaHkcal: math.NaN(), + DeltaScalK: math.NaN(), + DeltaG37kcal: g37, + PerfectRefG37kcal: refG37, + DeltaDeltaG37kcal: ddg37, + Source: MismatchSourceTripletDeltaG, + ParameterSet: MismatchParameterSetSantaLuciaHicks2004CompiledDimerGaugeV1, + Citation: santaLuciaHicksInternalMismatchCompiledGaugeCitation, + Note: "1 M NaCl, 37 °C; isolated internal " + note + ".", + } +} + +// CuratedMismatchTriplets contains isolated internal single-base DNA/DNA +// mismatch triplet penalties. Primer is 5'→3'; target is primer-aligned 3'→5'. +// +// This table expands SantaLucia-Hicks 2004 Table 2 to all 192 oriented +// non-Watson-Crick internal single-mismatch triplets by summing the two +// adjacent compiled-dimer mismatch increments and subtracting the matched +// Watson-Crick local reference from Table 1. These are local ΔΔG°37 scoring +// penalties, not unique measured physical trimers for every mismatch family. +var CuratedMismatchTriplets = []MismatchTripletParameter{ + // A/A mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'A', 'T', 'A', 'T', 1.30, -2.00, 3.30, "A/A in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'C', 'T', 'A', 'G', 0.78, -2.44, 3.22, "A/A in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'G', 'T', 'A', 'C', 1.04, -2.28, 3.32, "A/A in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'T', 'T', 'A', 'A', 1.22, -1.88, 3.10, "A/A in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'A', 'G', 'A', 'T', 1.12, -2.45, 3.57, "A/A in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'C', 'G', 'A', 'G', 0.60, -2.89, 3.49, "A/A in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'G', 'G', 'A', 'C', 0.86, -2.73, 3.59, "A/A in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'T', 'G', 'A', 'A', 1.04, -2.33, 3.37, "A/A in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'A', 'C', 'A', 'T', 0.86, -2.30, 3.16, "A/A in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'C', 'C', 'A', 'G', 0.34, -2.74, 3.08, "A/A in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'G', 'C', 'A', 'C', 0.60, -2.58, 3.18, "A/A in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'T', 'C', 'A', 'A', 0.78, -2.18, 2.96, "A/A in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'A', 'A', 'A', 'T', 1.38, -1.58, 2.96, "A/A in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'C', 'A', 'A', 'G', 0.86, -2.02, 2.88, "A/A in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'G', 'A', 'A', 'C', 1.12, -1.86, 2.98, "A/A in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'T', 'A', 'A', 'A', 1.30, -1.46, 2.76, "A/A in T/A and T/A flanks"), + + // A/C mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'A', 'T', 'C', 'T', 2.21, -2.00, 4.21, "A/C in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'C', 'T', 'C', 'G', 1.35, -2.44, 3.79, "A/C in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'G', 'T', 'C', 'C', 1.67, -2.28, 3.95, "A/C in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'T', 'T', 'C', 'A', 1.65, -1.88, 3.53, "A/C in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'A', 'G', 'C', 'T', 2.08, -2.45, 4.53, "A/C in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'C', 'G', 'C', 'G', 1.22, -2.89, 4.11, "A/C in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'G', 'G', 'C', 'C', 1.54, -2.73, 4.27, "A/C in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'T', 'G', 'C', 'A', 1.52, -2.33, 3.85, "A/C in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'A', 'C', 'C', 'T', 2.14, -2.30, 4.44, "A/C in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'C', 'C', 'C', 'G', 1.28, -2.74, 4.02, "A/C in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'G', 'C', 'C', 'C', 1.60, -2.58, 4.18, "A/C in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'T', 'C', 'C', 'A', 1.58, -2.18, 3.76, "A/C in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'A', 'A', 'C', 'T', 2.25, -1.58, 3.83, "A/C in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'C', 'A', 'C', 'G', 1.39, -2.02, 3.41, "A/C in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'G', 'A', 'C', 'C', 1.71, -1.86, 3.57, "A/C in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'T', 'A', 'C', 'A', 1.69, -1.46, 3.15, "A/C in T/A and T/A flanks"), + + // A/G mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'A', 'T', 'G', 'T', 0.88, -2.00, 2.88, "A/G in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'C', 'T', 'G', 'G', -0.38, -2.44, 2.06, "A/G in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'G', 'T', 'G', 'C', 0.25, -2.28, 2.53, "A/G in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'A', 'T', 'T', 'G', 'A', 0.16, -1.88, 2.04, "A/G in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'A', 'G', 'G', 'T', 0.77, -2.45, 3.22, "A/G in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'C', 'G', 'G', 'G', -0.49, -2.89, 2.40, "A/G in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'G', 'G', 'G', 'C', 0.14, -2.73, 2.87, "A/G in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'A', 'T', 'G', 'G', 'A', 0.05, -2.33, 2.38, "A/G in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'A', 'C', 'G', 'T', 0.49, -2.30, 2.79, "A/G in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'C', 'C', 'G', 'G', -0.77, -2.74, 1.97, "A/G in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'G', 'C', 'G', 'C', -0.14, -2.58, 2.44, "A/G in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'A', 'T', 'C', 'G', 'A', -0.23, -2.18, 1.95, "A/G in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'A', 'A', 'G', 'T', 1.16, -1.58, 2.74, "A/G in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'C', 'A', 'G', 'G', -0.10, -2.02, 1.92, "A/G in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'G', 'A', 'G', 'C', 0.53, -1.86, 2.39, "A/G in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'A', 'T', 'A', 'G', 'A', 0.44, -1.46, 1.90, "A/G in T/A and T/A flanks"), + + // C/A mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'A', 'T', 'A', 'T', 1.69, -2.89, 4.58, "C/A in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'C', 'T', 'A', 'G', 1.58, -3.28, 4.86, "C/A in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'G', 'T', 'A', 'C', 1.52, -3.61, 5.13, "C/A in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'T', 'T', 'A', 'A', 1.65, -2.72, 4.37, "C/A in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'A', 'G', 'A', 'T', 1.71, -3.29, 5.00, "C/A in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'C', 'G', 'A', 'G', 1.60, -3.68, 5.28, "C/A in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'G', 'G', 'A', 'C', 1.54, -4.01, 5.55, "C/A in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'T', 'G', 'A', 'A', 1.67, -3.12, 4.79, "C/A in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'A', 'C', 'A', 'T', 1.39, -3.69, 5.08, "C/A in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'C', 'C', 'A', 'G', 1.28, -4.08, 5.36, "C/A in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'G', 'C', 'A', 'C', 1.22, -4.41, 5.63, "C/A in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'T', 'C', 'A', 'A', 1.35, -3.52, 4.87, "C/A in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'A', 'A', 'A', 'T', 2.25, -2.75, 5.00, "C/A in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'C', 'A', 'A', 'G', 2.14, -3.14, 5.28, "C/A in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'G', 'A', 'A', 'C', 2.08, -3.47, 5.55, "C/A in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'T', 'A', 'A', 'A', 2.21, -2.58, 4.79, "C/A in T/A and T/A flanks"), + + // C/C mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'A', 'T', 'C', 'T', 2.38, -2.89, 5.27, "C/C in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'C', 'T', 'C', 'G', 2.12, -3.28, 5.40, "C/C in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'G', 'T', 'C', 'C', 2.03, -3.61, 5.64, "C/C in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'T', 'T', 'C', 'A', 2.66, -2.72, 5.38, "C/C in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'A', 'G', 'C', 'T', 1.75, -3.29, 5.04, "C/C in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'C', 'G', 'C', 'G', 1.49, -3.68, 5.17, "C/C in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'G', 'G', 'C', 'C', 1.40, -4.01, 5.41, "C/C in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'T', 'G', 'C', 'A', 2.03, -3.12, 5.15, "C/C in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'A', 'C', 'C', 'T', 1.84, -3.69, 5.53, "C/C in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'C', 'C', 'C', 'G', 1.58, -4.08, 5.66, "C/C in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'G', 'C', 'C', 'C', 1.49, -4.41, 5.90, "C/C in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'T', 'C', 'C', 'A', 2.12, -3.52, 5.64, "C/C in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'A', 'A', 'C', 'T', 2.10, -2.75, 4.85, "C/C in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'C', 'A', 'C', 'G', 1.84, -3.14, 4.98, "C/C in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'G', 'A', 'C', 'C', 1.75, -3.47, 5.22, "C/C in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'T', 'A', 'C', 'A', 2.38, -2.58, 4.96, "C/C in T/A and T/A flanks"), + + // C/T mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'A', 'T', 'T', 'T', 1.39, -2.89, 4.28, "C/T in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'C', 'T', 'T', 'G', 1.62, -3.28, 4.90, "C/T in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'G', 'T', 'T', 'C', 1.04, -3.61, 4.65, "C/T in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'C', 'T', 'T', 'T', 'A', 1.37, -2.72, 4.09, "C/T in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'A', 'G', 'T', 'T', 1.37, -3.29, 4.66, "C/T in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'C', 'G', 'T', 'G', 1.60, -3.68, 5.28, "C/T in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'G', 'G', 'T', 'C', 1.02, -4.01, 5.03, "C/T in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'C', 'T', 'G', 'T', 'A', 1.35, -3.12, 4.47, "C/T in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'A', 'C', 'T', 'T', 1.37, -3.69, 5.06, "C/T in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'C', 'C', 'T', 'G', 1.60, -4.08, 5.68, "C/T in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'G', 'C', 'T', 'C', 1.02, -4.41, 5.43, "C/T in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'C', 'T', 'C', 'T', 'A', 1.35, -3.52, 4.87, "C/T in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'A', 'A', 'T', 'T', 1.72, -2.75, 4.47, "C/T in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'C', 'A', 'T', 'G', 1.95, -3.14, 5.09, "C/T in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'G', 'A', 'T', 'C', 1.37, -3.47, 4.84, "C/T in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'C', 'T', 'A', 'T', 'A', 1.70, -2.58, 4.28, "C/T in T/A and T/A flanks"), + + // G/A mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'A', 'T', 'A', 'T', 0.44, -2.58, 3.02, "G/A in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'C', 'T', 'A', 'G', -0.23, -3.52, 3.29, "G/A in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'G', 'T', 'A', 'C', 0.05, -3.12, 3.17, "G/A in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'T', 'T', 'A', 'A', 0.16, -2.72, 2.88, "G/A in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'A', 'G', 'A', 'T', 0.53, -3.47, 4.00, "G/A in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'C', 'G', 'A', 'G', -0.14, -4.41, 4.27, "G/A in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'G', 'G', 'A', 'C', 0.14, -4.01, 4.15, "G/A in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'T', 'G', 'A', 'A', 0.25, -3.61, 3.86, "G/A in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'A', 'C', 'A', 'T', -0.10, -3.14, 3.04, "G/A in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'C', 'C', 'A', 'G', -0.77, -4.08, 3.31, "G/A in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'G', 'C', 'A', 'C', -0.49, -3.68, 3.19, "G/A in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'T', 'C', 'A', 'A', -0.38, -3.28, 2.90, "G/A in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'A', 'A', 'A', 'T', 1.16, -2.75, 3.91, "G/A in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'C', 'A', 'A', 'G', 0.49, -3.69, 4.18, "G/A in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'G', 'A', 'A', 'C', 0.77, -3.29, 4.06, "G/A in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'T', 'A', 'A', 'A', 0.88, -2.89, 3.77, "G/A in T/A and T/A flanks"), + + // G/G mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'A', 'T', 'G', 'T', 0.31, -2.58, 2.89, "G/G in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'C', 'T', 'G', 'G', -1.24, -3.52, 2.28, "G/G in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'G', 'T', 'G', 'C', -0.24, -3.12, 2.88, "G/G in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'T', 'T', 'G', 'A', -0.26, -2.72, 2.46, "G/G in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'A', 'G', 'G', 'T', 0.33, -3.47, 3.80, "G/G in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'C', 'G', 'G', 'G', -1.22, -4.41, 3.19, "G/G in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'G', 'G', 'G', 'C', -0.22, -4.01, 3.79, "G/G in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'T', 'G', 'G', 'A', -0.24, -3.61, 3.37, "G/G in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'A', 'C', 'G', 'T', -0.67, -3.14, 2.47, "G/G in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'C', 'C', 'G', 'G', -2.22, -4.08, 1.86, "G/G in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'G', 'C', 'G', 'C', -1.22, -3.68, 2.46, "G/G in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'T', 'C', 'G', 'A', -1.24, -3.28, 2.04, "G/G in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'A', 'A', 'G', 'T', 0.88, -2.75, 3.63, "G/G in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'C', 'A', 'G', 'G', -0.67, -3.69, 3.02, "G/G in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'G', 'A', 'G', 'C', 0.33, -3.29, 3.62, "G/G in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'T', 'A', 'G', 'A', 0.31, -2.89, 3.20, "G/G in T/A and T/A flanks"), + + // G/T mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'A', 'T', 'T', 'T', 1.05, -2.58, 3.63, "G/T in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'C', 'T', 'T', 'G', 0.12, -3.52, 3.64, "G/T in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'G', 'T', 'T', 'C', 0.39, -3.12, 3.51, "G/T in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'G', 'T', 'T', 'T', 'A', 0.78, -2.72, 3.50, "G/T in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'A', 'G', 'T', 'T', -0.13, -3.47, 3.34, "G/T in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'C', 'G', 'T', 'G', -1.06, -4.41, 3.35, "G/T in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'G', 'G', 'T', 'C', -0.79, -4.01, 3.22, "G/T in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'G', 'T', 'G', 'T', 'A', -0.40, -3.61, 3.21, "G/T in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'A', 'C', 'T', 'T', 0.42, -3.14, 3.56, "G/T in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'C', 'C', 'T', 'G', -0.51, -4.08, 3.57, "G/T in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'G', 'C', 'T', 'C', -0.24, -3.68, 3.44, "G/T in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'G', 'T', 'C', 'T', 'A', 0.15, -3.28, 3.43, "G/T in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'A', 'A', 'T', 'T', 0.77, -2.75, 3.52, "G/T in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'C', 'A', 'T', 'G', -0.16, -3.69, 3.53, "G/T in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'G', 'A', 'T', 'C', 0.11, -3.29, 3.40, "G/T in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'G', 'T', 'A', 'T', 'A', 0.50, -2.89, 3.39, "G/T in T/A and T/A flanks"), + + // T/C mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'A', 'T', 'C', 'T', 1.70, -1.46, 3.16, "T/C in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'C', 'T', 'C', 'G', 1.35, -2.18, 3.53, "T/C in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'G', 'T', 'C', 'C', 1.35, -2.33, 3.68, "T/C in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'T', 'T', 'C', 'A', 1.37, -1.88, 3.25, "T/C in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'A', 'G', 'C', 'T', 1.37, -1.86, 3.23, "T/C in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'C', 'G', 'C', 'G', 1.02, -2.58, 3.60, "T/C in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'G', 'G', 'C', 'C', 1.02, -2.73, 3.75, "T/C in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'T', 'G', 'C', 'A', 1.04, -2.28, 3.32, "T/C in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'A', 'C', 'C', 'T', 1.95, -2.02, 3.97, "T/C in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'C', 'C', 'C', 'G', 1.60, -2.74, 4.34, "T/C in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'G', 'C', 'C', 'C', 1.60, -2.89, 4.49, "T/C in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'T', 'C', 'C', 'A', 1.62, -2.44, 4.06, "T/C in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'A', 'A', 'C', 'T', 1.72, -1.58, 3.30, "T/C in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'C', 'A', 'C', 'G', 1.37, -2.30, 3.67, "T/C in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'G', 'A', 'C', 'C', 1.37, -2.45, 3.82, "T/C in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'T', 'A', 'C', 'A', 1.39, -2.00, 3.39, "T/C in T/A and T/A flanks"), + + // T/G mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'A', 'T', 'G', 'T', 0.50, -1.46, 1.96, "T/G in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'C', 'T', 'G', 'G', 0.15, -2.18, 2.33, "T/G in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'G', 'T', 'G', 'C', -0.40, -2.33, 1.93, "T/G in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'T', 'T', 'G', 'A', 0.78, -1.88, 2.66, "T/G in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'A', 'G', 'G', 'T', 0.11, -1.86, 1.97, "T/G in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'C', 'G', 'G', 'G', -0.24, -2.58, 2.34, "T/G in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'G', 'G', 'G', 'C', -0.79, -2.73, 1.94, "T/G in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'T', 'G', 'G', 'A', 0.39, -2.28, 2.67, "T/G in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'A', 'C', 'G', 'T', -0.16, -2.02, 1.86, "T/G in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'C', 'C', 'G', 'G', -0.51, -2.74, 2.23, "T/G in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'G', 'C', 'G', 'C', -1.06, -2.89, 1.83, "T/G in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'T', 'C', 'G', 'A', 0.12, -2.44, 2.56, "T/G in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'A', 'A', 'G', 'T', 0.77, -1.58, 2.35, "T/G in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'C', 'A', 'G', 'G', 0.42, -2.30, 2.72, "T/G in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'G', 'A', 'G', 'C', -0.13, -2.45, 2.32, "T/G in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'T', 'A', 'G', 'A', 1.05, -2.00, 3.05, "T/G in T/A and T/A flanks"), + + // T/T mismatches. + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'A', 'T', 'T', 'T', 1.37, -1.46, 2.83, "T/T in A/T and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'C', 'T', 'T', 'G', 1.14, -2.18, 3.32, "T/T in A/T and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'G', 'T', 'T', 'C', 0.57, -2.33, 2.90, "T/T in A/T and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('A', 'T', 'T', 'T', 'T', 'A', 1.38, -1.88, 3.26, "T/T in A/T and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'A', 'G', 'T', 'T', 0.56, -1.86, 2.42, "T/T in C/G and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'C', 'G', 'T', 'G', 0.33, -2.58, 2.91, "T/T in C/G and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'G', 'G', 'T', 'C', -0.24, -2.73, 2.49, "T/T in C/G and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('C', 'T', 'T', 'G', 'T', 'A', 0.57, -2.28, 2.85, "T/T in C/G and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'A', 'C', 'T', 'T', 1.13, -2.02, 3.15, "T/T in G/C and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'C', 'C', 'T', 'G', 0.90, -2.74, 3.64, "T/T in G/C and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'G', 'C', 'T', 'C', 0.33, -2.89, 3.22, "T/T in G/C and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('G', 'T', 'T', 'C', 'T', 'A', 1.14, -2.44, 3.58, "T/T in G/C and T/A flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'A', 'A', 'T', 'T', 1.36, -1.58, 2.94, "T/T in T/A and A/T flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'C', 'A', 'T', 'G', 1.13, -2.30, 3.43, "T/T in T/A and C/G flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'G', 'A', 'T', 'C', 0.56, -2.45, 3.01, "T/T in T/A and G/C flanks"), + santaLuciaHicksCompiledGaugeTriplet('T', 'T', 'T', 'A', 'T', 'A', 1.37, -2.00, 3.37, "T/T in T/A and T/A flanks"), +} + +func init() { + for _, p := range CuratedMismatchTriplets { + DeltaGTriplet[p.Key] = p.DeltaDeltaG37kcal + DeltaGTripletSource[p.Key] = p.Source + DeltaGTripletParameterSet[p.Key] = p.ParameterSet + DeltaGTripletCitation[p.Key] = p.Citation + DeltaGTripletNote[p.Key] = p.Note + } +} diff --git a/core/thermo/nn.go b/core/thermo/nn.go index 86642ae..aabdbe7 100644 --- a/core/thermo/nn.go +++ b/core/thermo/nn.go @@ -33,7 +33,7 @@ type NNParams struct { // SantaLucia & Hicks (2004), Table 1. var dimerParams = map[string]NNParams{ // Canonical 10 - "AA/TT": {-7.6, -21.3}, + "AA/TT": {-7.9, -22.2}, "AT/TA": {-7.2, -20.4}, "TA/AT": {-7.2, -21.3}, "CA/GT": {-8.5, -22.7}, @@ -44,13 +44,13 @@ var dimerParams = map[string]NNParams{ "GC/CG": {-9.8, -24.4}, "GG/CC": {-8.0, -19.9}, - // Synonym keys (reverse-orientation / swapped forms) - "TT/AA": {-7.6, -21.3}, // same as AA/TT + // Synonym keys from strand-interchange representations. + "TT/AA": {-7.9, -22.2}, // same as AA/TT "CC/GG": {-8.0, -19.9}, // same as GG/CC - "AC/TG": {-8.5, -22.7}, // same as CA/GT - "TG/AC": {-8.4, -22.4}, // same as GT/CA - "AG/TC": {-8.2, -22.2}, // same as GA/CT - "TC/AG": {-7.8, -21.0}, // same as CT/GA + "AC/TG": {-8.4, -22.4}, // same as GT/CA + "TG/AC": {-8.5, -22.7}, // same as CA/GT + "AG/TC": {-7.8, -21.0}, // same as CT/GA + "TC/AG": {-8.2, -22.2}, // same as GA/CT } // Initiation / terminal / symmetry (1 M Na+). @@ -62,9 +62,12 @@ var ( // TmInput describes solution and concentration. type TmInput struct { - CT float64 // total strand conc (mol/L); non-self: formula uses ln(CT/x) - Na float64 // monovalent cations (mol/L), e.g. 0.05 for 50 mM - X int // duplex type: 4 (non-self, default) or 1 (self-compl) + CT float64 // total strand conc (mol/L); non-self: formula uses ln(CT/x) + Na float64 // monovalent cations (mol/L), e.g. 0.05 for 50 mM + Mg float64 // total Mg2+ (mol/L) + Dntp float64 // total dNTP (mol/L), used to estimate free Mg2+ + SaltModel SaltModel // monovalent | owczarzy-lite | owczarzy08 + X int // duplex type: 4 (non-self, default) or 1 (self-compl) } // Result reports ΔH/ΔS (1M and salt-corrected) and Tm. @@ -75,6 +78,17 @@ type Result struct { TmC float64 // melting temperature (°C) } +// DuplexResult reports condition-aware thermodynamic quantities for a perfect +// Watson-Crick primer-template duplex at the configured annealing temperature. +type DuplexResult struct { + Result + AnnealC float64 + AnnealMarginC float64 + DeltaGAtAnnealKcal float64 + EffectiveDenomCalK float64 + SelfComplementary bool +} + // Tm computes Tm for primer (5'→3') vs target (3'→5') aligned WC. // Seqs must be equal length; only A/C/G/T bases supported. func Tm(primer5to3, target3to5 string, in TmInput) (Result, error) { @@ -88,8 +102,11 @@ func Tm(primer5to3, target3to5 string, in TmInput) (Result, error) { if in.CT <= 0 { return out, errors.New("Tm: CT must be > 0") } - if in.Na <= 0 { - return out, errors.New("Tm: [Na+] must be > 0") + if in.SaltModel == "" { + in.SaltModel = SaltModelMonovalent + } + if in.Na <= 0 && (in.SaltModel != SaltModelOwczarzy08 || FreeMagnesium(in.Mg, in.Dntp) <= 0) { + return out, errors.New("Tm: salt concentration must be > 0") } x := in.X if x != 1 && x != 4 { @@ -155,19 +172,134 @@ func Tm(primer5to3, target3to5 string, in TmInput) (Result, error) { DS += symmDS } - // 2) Salt correction: ΔS([Na+]) = ΔS(1M) + 0.368*(N/2)*ln[Na+]; N = 2*n − 2 phosphates. + // 2) Salt correction. Monovalent and owczarzy-lite use the historical entropy + // correction form. owczarzy08 applies the mixed monovalent/divalent inverse-Tm + // correction and back-computes an effective salt-corrected entropy so existing + // downstream denominator logic remains consistent. N := float64(2*n - 2) - DS_Na := DS + 0.368*(N/2.0)*math.Log(in.Na) - - // 3) Two-state Tm (K), then °C. ΔH in cal/mol. - tmK := (DH * 1000.0) / (DS_Na + Rcal*math.Log(in.CT/float64(x))) + logConc := Rcal * math.Log(in.CT/float64(x)) + DS_salt := DS + 0.368*(N/2.0)*math.Log(positiveSalt(in.Na)) + tmK := (DH * 1000.0) / (DS_salt + logConc) + if in.SaltModel == SaltModelOwczarzy08 { + tmK, DS_salt = owczarzy08TmK(DH, DS, logConc, p, in) + } out.DH_kcal = DH out.DS_cal = DS - out.DS_Na = DS_Na + out.DS_Na = DS_salt out.TmC = tmK - 273.15 return out, nil } +func positiveSalt(x float64) float64 { + if x <= 0 || math.IsNaN(x) || math.IsInf(x, 0) { + return 1e-9 + } + return x +} + +func gcFraction(s string) float64 { + if len(s) == 0 { + return 0 + } + gc := 0 + for i := 0; i < len(s); i++ { + switch s[i] { + case 'G', 'g', 'C', 'c': + gc++ + } + } + return float64(gc) / float64(len(s)) +} + +func owczarzy08TmK(dh, ds, logConc float64, primer string, in TmInput) (float64, float64) { + // Start from the 1 M salt two-state temperature, then apply the Owczarzy 2008 + // mixed-salt correction as an inverse-temperature offset. Concentrations are + // stored in mol/L internally; this formula uses mol/L for monovalent and free + // Mg terms after dNTP chelation. + tm1M := (dh * 1000.0) / (ds + logConc) + freeMg := FreeMagnesium(in.Mg, in.Dntp) + mon := positiveSalt(in.Na) + if freeMg <= 0 { + corr := owczarzyMonovalentInverseTmCorrection(mon, primer) + tmK := 1.0 / (1.0/tm1M + corr) + return tmK, effectiveEntropyFromTm(dh, logConc, tmK, ds, primer, mon) + } + + gc := gcFraction(primer) + corr := 0.0 + if mon > 0 { + ratio := math.Sqrt(freeMg) / mon + if ratio < 0.22 { + corr = owczarzyMonovalentInverseTmCorrection(mon, primer) + tmK := 1.0 / (1.0/tm1M + corr) + return tmK, effectiveEntropyFromTm(dh, logConc, tmK, ds, primer, mon) + } + corr = owczarzyMagnesiumInverseTmCorrection(freeMg, mon, gc, len(primer), ratio) + } else { + corr = owczarzyMagnesiumInverseTmCorrection(freeMg, 0, gc, len(primer), math.Inf(1)) + } + tmK := 1.0 / (1.0/tm1M + corr) + return tmK, effectiveEntropyFromTm(dh, logConc, tmK, ds, primer, mon) +} + +func owczarzyMonovalentInverseTmCorrection(mon float64, primer string) float64 { + lnMon := math.Log(positiveSalt(mon)) + return (4.29*gcFraction(primer)-3.95)*1e-5*lnMon + 9.40e-6*lnMon*lnMon +} + +func owczarzyMagnesiumInverseTmCorrection(mg, mon, gc float64, n int, ratio float64) float64 { + lnMg := math.Log(positiveSalt(mg)) + a, b, c, d := 3.92, -0.911, 6.26, 1.42 + e, f, g := -48.2, 52.5, 8.31 + if mon > 0 && ratio < 6.0 { + lnMon := math.Log(positiveSalt(mon)) + sqrtMon := math.Sqrt(positiveSalt(mon)) + a = 3.92 * (0.843 - 0.352*sqrtMon*lnMon) + d = 1.42 * (1.279 - 4.03e-3*lnMon - 8.03e-3*lnMon*lnMon) + g = 8.31 * (0.486 - 0.258*lnMon + 5.25e-3*lnMon*lnMon*lnMon) + } + lengthTerm := 0.0 + if n > 1 { + lengthTerm = (e + f*lnMg + g*lnMg*lnMg) / (2.0 * float64(n-1)) + } + return (a + b*lnMg + gc*(c+d*lnMg) + lengthTerm) * 1e-5 +} + +func effectiveEntropyFromTm(dh, logConc, tmK, fallbackDS float64, primer string, mon float64) float64 { + if math.IsNaN(tmK) || math.IsInf(tmK, 0) || tmK <= 0 { + return fallbackDS + 0.368*(float64(2*len(primer)-2)/2.0)*math.Log(positiveSalt(mon)) + } + return (dh*1000.0)/tmK - logConc +} + +// PerfectDuplex computes nearest-neighbor duplex thermodynamics for a primer +// (5'→3') aligned to a target strand supplied 3'→5'. Only perfect Watson-Crick +// A/C/G/T duplexes are accepted. For mismatched duplexes, callers should use +// this function on the perfect complement and apply an explicit mismatch policy. +func PerfectDuplex(primer5to3, target3to5 string, cond Conditions) (DuplexResult, error) { + var out DuplexResult + p := strings.ToUpper(strings.TrimSpace(primer5to3)) + cond = cond.WithDefaults() + cond.SelfComplementary = isSelfCompl(p) + in := cond.TmInput() + + res, err := Tm(p, target3to5, in) + if err != nil { + return out, err + } + denom := res.DS_Na + Rcal*math.Log(in.CT/float64(in.X)) + deltaG := res.DH_kcal - (cond.AnnealC+273.15)*denom/1000.0 + out = DuplexResult{ + Result: res, + AnnealC: cond.AnnealC, + AnnealMarginC: res.TmC - cond.AnnealC, + DeltaGAtAnnealKcal: deltaG, + EffectiveDenomCalK: denom, + SelfComplementary: cond.SelfComplementary, + } + return out, nil +} + // ---------- helpers ---------- func wc(a, b byte) bool { diff --git a/core/thermo/nn_test.go b/core/thermo/nn_test.go index 2e14019..e0f1b38 100644 --- a/core/thermo/nn_test.go +++ b/core/thermo/nn_test.go @@ -82,12 +82,12 @@ func TestTm_InputValidation(t *testing.T) { t.Fatalf("expected CT>0 error, got: %v", err) } }) - t.Run("[Na+] must be > 0", func(t *testing.T) { + t.Run("salt must be > 0", func(t *testing.T) { in := newInp() in.Na = 0 _, err := Tm("AA", "TT", in) - if err == nil || !strings.Contains(err.Error(), "[Na+] must be > 0") { - t.Fatalf("expected [Na+]>0 error, got: %v", err) + if err == nil || !strings.Contains(err.Error(), "salt concentration must be > 0") { + t.Fatalf("expected salt concentration error, got: %v", err) } }) t.Run("non-ACGT target", func(t *testing.T) { @@ -213,3 +213,84 @@ func TestTm_Orientation_MustBe3to5WC(t *testing.T) { t.Fatalf("expected non-WC error for wrong orientation, got: %v", err) } } + +func TestNNParams_AATTCanonicalSnapshot(t *testing.T) { + got, ok := dimerParams["AA/TT"] + if !ok { + t.Fatal("missing AA/TT nearest-neighbor parameter") + } + if got.DH != -7.9 || got.DS != -22.2 { + t.Fatalf("AA/TT parameters drifted: got ΔH=%g ΔS=%g, want -7.9/-22.2", got.DH, got.DS) + } + syn, ok := dimerParams["TT/AA"] + if !ok { + t.Fatal("missing TT/AA nearest-neighbor synonym") + } + if syn != got { + t.Fatalf("TT/AA synonym drifted: got %+v, want %+v", syn, got) + } +} + +func TestPerfectDuplexReportsAnnealMarginAndDeltaG(t *testing.T) { + primer := strings.ToUpper("ACGTACGTACGTACGTACGT") + cond := DefaultConditions() + cond.AnnealC = 60 + + res, err := PerfectDuplex(primer, comp(primer), cond) + if err != nil { + t.Fatalf("PerfectDuplex returned error: %v", err) + } + if math.Abs(res.AnnealMarginC-(res.TmC-cond.AnnealC)) > 1e-9 { + t.Fatalf("margin mismatch: got %g want %g", res.AnnealMarginC, res.TmC-cond.AnnealC) + } + if res.EffectiveDenomCalK == 0 || math.IsNaN(res.DeltaGAtAnnealKcal) { + t.Fatalf("invalid thermodynamic components: %+v", res) + } +} + +func TestPerfectDuplexRejectsMismatch(t *testing.T) { + primer := strings.ToUpper("ACGTACGTACGT") + target := []byte(comp(primer)) + target[3] = 'A' + if target[3] == comp(primer)[3] { + target[3] = 'C' + } + _, err := PerfectDuplex(primer, string(target), DefaultConditions()) + if err == nil || !strings.Contains(err.Error(), "non-WC pair") { + t.Fatalf("expected non-WC rejection, got %v", err) + } +} + +func TestTm_Owczarzy08MixedSaltAndDNTPEffect(t *testing.T) { + primer := strings.ToUpper("GGGGCCCCGGGGCCCCGGGGCCCC") + target3to5 := comp(primer) + + mono := newInp() + mono.Na = 0.05 + mono.Mg = 0.003 + mono.SaltModel = SaltModelMonovalent + resMono, err := Tm(primer, target3to5, mono) + if err != nil { + t.Fatalf("monovalent Tm: %v", err) + } + + mixed := mono + mixed.SaltModel = SaltModelOwczarzy08 + resMixed, err := Tm(primer, target3to5, mixed) + if err != nil { + t.Fatalf("owczarzy08 Tm: %v", err) + } + if math.Abs(resMixed.TmC-resMono.TmC) < 1e-9 { + t.Fatalf("expected owczarzy08 to differ from monovalent: mixed=%g mono=%g", resMixed.TmC, resMono.TmC) + } + + withDntp := mixed + withDntp.Dntp = 0.0025 + resDntp, err := Tm(primer, target3to5, withDntp) + if err != nil { + t.Fatalf("owczarzy08+dNTP Tm: %v", err) + } + if !(resDntp.TmC < resMixed.TmC) { + t.Fatalf("expected dNTP chelation to lower mixed-salt Tm: dntp=%g mixed=%g", resDntp.TmC, resMixed.TmC) + } +} diff --git a/core/thermo/structure.go b/core/thermo/structure.go new file mode 100644 index 0000000..023ae75 --- /dev/null +++ b/core/thermo/structure.go @@ -0,0 +1,942 @@ +package thermo + +import ( + "errors" + "math" + "strings" +) + +const ( + StructureHairpin = "hairpin" + StructureSelfDimer = "self-dimer" + StructureCrossDimer = "cross-dimer" + + StructureModelContiguousStemV1 = "nn-contiguous-stem-v1" + StructureModelStemLoopV2 = "nn-stem-loop-v2" + StructureModelPartitionV1 = "nn-structure-partition-v1" +) + +// StructureOptions configures the v1 secondary-structure evaluator. +type StructureOptions struct { + Conditions Conditions + MinStem int + MinLoop int + + // V2 permits one interruption inside an otherwise Watson-Crick stem. Bulges + // are one-sided interruptions; internal loops have unpaired bases on both + // strands. The defaults keep the search conservative for primer-scale oligos. + MaxBulge int + MaxInternalLoop int + + // PartitionMaxCandidates bounds the candidate ensemble used by the partition + // evaluator. Zero uses a conservative primer-scale default. + PartitionMaxCandidates int +} + +// StructureResult describes the strongest contiguous nearest-neighbor structure +// found for a primer or primer pair. Coordinates are 0-based in the submitted +// 5'→3' sequence(s). +type StructureResult struct { + Kind string + Model string + DeltaGAtAnnealKcal float64 + TmC float64 + AnnealMarginC float64 + StemLen int + LoopLen int + AStart int + AEnd int + BStart int + BEnd int + ThreePrimeAnchored bool + BothThreePrimeAnchor bool + + SegmentCount int + BulgeCount int + InternalLoopCount int + DanglingEndCount int + LoopPenaltyKcal float64 + BulgePenaltyKcal float64 + InternalLoopPenaltyKcal float64 + DanglingAdjustmentKcal float64 + + // Ensemble fields are populated by StructureModelPartitionV1. The coordinate + // fields continue to describe the best MFE candidate; these fields describe + // the bounded Boltzmann ensemble around that candidate. + EnsembleDeltaGAtAnnealKcal float64 + PartitionFunction float64 + EnsembleWeight float64 + EnsembleCandidateCount int + + // DP fields are populated by the non-crossing hairpin dynamic-programming + // partition layer. They are separate from candidate enumeration metadata. + DPCellCount int + DPStateCount int + DPExpectedPairs float64 + DPMFEDeltaGAtAnnealKcal float64 + DPEnsembleDeltaGAtAnnealKcal float64 +} + +func DefaultStructureOptions(cond Conditions) StructureOptions { + return StructureOptions{ + Conditions: cond.WithDefaults(), + MinStem: 4, + MinLoop: 3, + MaxBulge: 2, + MaxInternalLoop: 2, + PartitionMaxCandidates: 256, + } +} + +func BestHairpin(seq5to3 string, opts StructureOptions) (StructureResult, bool, error) { + opts = normalizeStructureOptions(opts) + seq, ok := normalizeACGTStructure(seq5to3) + if !ok { + return StructureResult{}, false, errors.New("hairpin: sequence must be A/C/G/T") + } + if len(seq) < 2*opts.MinStem+opts.MinLoop { + return StructureResult{}, false, nil + } + + var best StructureResult + for i := 0; i <= len(seq)-opts.MinStem-opts.MinLoop-opts.MinStem; i++ { + for j := i + opts.MinStem + opts.MinLoop; j <= len(seq)-opts.MinStem; j++ { + maxStem := len(seq) - j + if byLoop := j - i - opts.MinLoop; byLoop < maxStem { + maxStem = byLoop + } + for stem := opts.MinStem; stem <= maxStem; stem++ { + if !hairpinStemWC(seq, i, j, stem) { + continue + } + loopLen := j - (i + stem) + top := seq[i : i+stem] + target3 := reverseStructure(seq[j : j+stem]) + cand, err := stemThermo(StructureHairpin, top, target3, opts.Conditions, loopLen) + if err != nil { + continue + } + cand.AStart, cand.AEnd = i, i+stem + cand.BStart, cand.BEnd = j, j+stem + cand.ThreePrimeAnchored = cand.AEnd == len(seq) || cand.BEnd == len(seq) + cand.BothThreePrimeAnchor = false + if betterStructureResult(cand, best) { + best = cand + } + } + } + } + return best, best.StemLen > 0, nil +} + +func BestSelfDimer(seq5to3 string, opts StructureOptions) (StructureResult, bool, error) { + return bestDimer(seq5to3, seq5to3, opts, StructureSelfDimer) +} + +func BestCrossDimer(a5to3, b5to3 string, opts StructureOptions) (StructureResult, bool, error) { + return bestDimer(a5to3, b5to3, opts, StructureCrossDimer) +} + +func BestHairpinV2(seq5to3 string, opts StructureOptions) (StructureResult, bool, error) { + v1, ok1, err := BestHairpin(seq5to3, opts) + if err != nil { + return StructureResult{}, false, err + } + if ok1 { + v1.Model = StructureModelContiguousStemV1 + } + v2, ok2, err := bestHairpinGapped(seq5to3, opts) + if err != nil { + return v1, ok1, err + } + if ok2 && betterStructureResult(v2, v1) { + return v2, true, nil + } + return v1, ok1, nil +} + +func BestSelfDimerV2(seq5to3 string, opts StructureOptions) (StructureResult, bool, error) { + return bestDimerV2(seq5to3, seq5to3, opts, StructureSelfDimer) +} + +func BestCrossDimerV2(a5to3, b5to3 string, opts StructureOptions) (StructureResult, bool, error) { + return bestDimerV2(a5to3, b5to3, opts, StructureCrossDimer) +} + +func BestHairpinPartition(seq5to3 string, opts StructureOptions) (StructureResult, bool, error) { + opts = normalizeStructureOptions(opts) + seq, ok := normalizeACGTStructure(seq5to3) + if !ok { + return StructureResult{}, false, errors.New("hairpin: sequence must be A/C/G/T") + } + candidates, err := hairpinPartitionCandidates(seq, opts) + if err != nil { + return StructureResult{}, false, err + } + best, ok, err := partitionStructureEnsemble(candidates, opts.Conditions) + if err != nil || !ok { + return best, ok, err + } + mergeHairpinDPEnsemble(&best, hairpinDPPartition(seq, opts)) + return best, true, nil +} + +func BestSelfDimerPartition(seq5to3 string, opts StructureOptions) (StructureResult, bool, error) { + return bestDimerPartition(seq5to3, seq5to3, opts, StructureSelfDimer) +} + +func BestCrossDimerPartition(a5to3, b5to3 string, opts StructureOptions) (StructureResult, bool, error) { + return bestDimerPartition(a5to3, b5to3, opts, StructureCrossDimer) +} + +func bestDimer(a5to3, b5to3 string, opts StructureOptions, kind string) (StructureResult, bool, error) { + opts = normalizeStructureOptions(opts) + a, okA := normalizeACGTStructure(a5to3) + b, okB := normalizeACGTStructure(b5to3) + if !okA || !okB { + return StructureResult{}, false, errors.New("dimer: sequences must be A/C/G/T") + } + if len(a) < opts.MinStem || len(b) < opts.MinStem { + return StructureResult{}, false, nil + } + + bTarget3 := reverseStructure(b) + var best StructureResult + for i := 0; i <= len(a)-opts.MinStem; i++ { + for j := 0; j <= len(bTarget3)-opts.MinStem; j++ { + run := 0 + for i+run < len(a) && j+run < len(bTarget3) && wc(a[i+run], bTarget3[j+run]) { + run++ + } + if run < opts.MinStem { + continue + } + cand, err := stemThermo(kind, a[i:i+run], bTarget3[j:j+run], opts.Conditions, 0) + if err != nil { + continue + } + cand.AStart, cand.AEnd = i, i+run + cand.BStart, cand.BEnd = len(b)-(j+run), len(b)-j + a3 := cand.AEnd == len(a) + b3 := cand.BEnd == len(b) + cand.ThreePrimeAnchored = a3 || b3 + cand.BothThreePrimeAnchor = a3 && b3 + if betterStructureResult(cand, best) { + best = cand + } + } + } + return best, best.StemLen > 0, nil +} + +func stemThermo(kind, top5to3, target3to5 string, cond Conditions, loopLen int) (StructureResult, error) { + if len(top5to3) != len(target3to5) || len(top5to3) == 0 { + return StructureResult{}, errors.New("structure stem: invalid stem") + } + cond = cond.WithDefaults() + local := cond + local.SelfComplementary = kind == StructureSelfDimer + res, err := Tm(top5to3, target3to5, local.TmInput()) + if err != nil { + return StructureResult{}, err + } + in := local.TmInput() + denom := res.DS_Na + Rcal*math.Log(in.CT/float64(in.X)) + if kind == StructureHairpin { + denom = res.DS_Na + } + loopPenalty := 0.0 + if kind == StructureHairpin { + loopPenalty = hairpinLoopPenaltyKcal(loopLen) + } + dg := res.DH_kcal - (cond.AnnealC+273.15)*denom/1000.0 + loopPenalty + tmC := res.TmC + if kind == StructureHairpin && denom != 0 { + tmC = ((res.DH_kcal + loopPenalty) * 1000.0 / denom) - 273.15 + } + return StructureResult{ + Kind: kind, + Model: StructureModelContiguousStemV1, + DeltaGAtAnnealKcal: dg, + TmC: tmC, + AnnealMarginC: tmC - cond.AnnealC, + StemLen: len(top5to3), + LoopLen: loopLen, + SegmentCount: 1, + LoopPenaltyKcal: loopPenalty, + }, nil +} + +type gappedStemCandidate struct { + aStart int + bStart int + stem1 int + gapA int + gapB int + stem2 int +} + +func (c gappedStemCandidate) aEnd() int { return c.aStart + c.stem1 + c.gapA + c.stem2 } +func (c gappedStemCandidate) bEnd() int { return c.bStart + c.stem1 + c.gapB + c.stem2 } +func (c gappedStemCandidate) stemLen() int { return c.stem1 + c.stem2 } + +func bestHairpinGapped(seq5to3 string, opts StructureOptions) (StructureResult, bool, error) { + opts = normalizeStructureOptions(opts) + seq, ok := normalizeACGTStructure(seq5to3) + if !ok { + return StructureResult{}, false, errors.New("hairpin: sequence must be A/C/G/T") + } + if len(seq) < 2*opts.MinStem+opts.MinLoop+1 { + return StructureResult{}, false, nil + } + + var best StructureResult + for i := 0; i < len(seq); i++ { + for j := i + opts.MinStem + opts.MinLoop; j < len(seq); j++ { + rightArm := seq[j:] + target3 := reverseStructure(rightArm) + for _, cand := range enumerateGappedStemCandidates(seq, target3, i, 0, opts) { + loopLen := j - cand.aEnd() + if loopLen < opts.MinLoop || cand.bEnd() > len(target3) { + continue + } + res, err := gappedStemThermo(StructureHairpin, seq, target3, cand, opts.Conditions, loopLen) + if err != nil { + continue + } + res.AStart, res.AEnd = cand.aStart, cand.aEnd() + // target3 is reverse(rightArm); convert back to original sequence coordinates. + res.BStart, res.BEnd = j+len(rightArm)-cand.bEnd(), j+len(rightArm)-cand.bStart + res.ThreePrimeAnchored = res.AEnd == len(seq) || res.BEnd == len(seq) + if betterStructureResult(res, best) { + best = res + } + } + } + } + return best, best.StemLen > 0, nil +} + +func bestDimerV2(a5to3, b5to3 string, opts StructureOptions, kind string) (StructureResult, bool, error) { + v1, ok1, err := bestDimer(a5to3, b5to3, opts, kind) + if err != nil { + return StructureResult{}, false, err + } + if ok1 { + v1.Model = StructureModelContiguousStemV1 + } + opts = normalizeStructureOptions(opts) + a, okA := normalizeACGTStructure(a5to3) + b, okB := normalizeACGTStructure(b5to3) + if !okA || !okB { + return StructureResult{}, false, errors.New("dimer: sequences must be A/C/G/T") + } + bTarget3 := reverseStructure(b) + + best := v1 + for i := 0; i < len(a); i++ { + for j := 0; j < len(bTarget3); j++ { + for _, cand := range enumerateGappedStemCandidates(a, bTarget3, i, j, opts) { + if cand.aEnd() > len(a) || cand.bEnd() > len(bTarget3) { + continue + } + res, err := gappedStemThermo(kind, a, bTarget3, cand, opts.Conditions, 0) + if err != nil { + continue + } + res.AStart, res.AEnd = cand.aStart, cand.aEnd() + res.BStart, res.BEnd = len(b)-cand.bEnd(), len(b)-cand.bStart + a3 := res.AEnd == len(a) + b3 := res.BEnd == len(b) + res.ThreePrimeAnchored = a3 || b3 + res.BothThreePrimeAnchor = a3 && b3 + if betterStructureResult(res, best) { + best = res + } + } + } + } + return best, best.StemLen > 0, nil +} + +func enumerateGappedStemCandidates(top, target3 string, aStart, bStart int, opts StructureOptions) []gappedStemCandidate { + minSeg := 2 + if opts.MinStem < 4 { + minSeg = 1 + } + out := make([]gappedStemCandidate, 0) + maxA := len(top) - aStart + maxB := len(target3) - bStart + for stem1 := minSeg; stem1 <= maxA && stem1 <= maxB; stem1++ { + if !segmentWC(top, target3, aStart, bStart, stem1) { + break + } + for gapA := 0; gapA <= opts.MaxBulge; gapA++ { + for gapB := 0; gapB <= opts.MaxInternalLoop; gapB++ { + if gapA == 0 && gapB == 0 { + continue + } + if gapA > 0 && gapB > 0 && (gapA > opts.MaxInternalLoop || gapB > opts.MaxInternalLoop) { + continue + } + a2 := aStart + stem1 + gapA + b2 := bStart + stem1 + gapB + if a2 >= len(top) || b2 >= len(target3) { + continue + } + for stem2 := minSeg; a2+stem2 <= len(top) && b2+stem2 <= len(target3); stem2++ { + if !segmentWC(top, target3, a2, b2, stem2) { + break + } + if stem1+stem2 < opts.MinStem { + continue + } + out = append(out, gappedStemCandidate{aStart: aStart, bStart: bStart, stem1: stem1, gapA: gapA, gapB: gapB, stem2: stem2}) + } + } + } + } + return out +} + +func segmentWC(top, target3 string, aStart, bStart, n int) bool { + if aStart < 0 || bStart < 0 || aStart+n > len(top) || bStart+n > len(target3) { + return false + } + for k := 0; k < n; k++ { + if !wc(top[aStart+k], target3[bStart+k]) { + return false + } + } + return true +} + +func gappedStemThermo(kind, top, target3 string, cand gappedStemCandidate, cond Conditions, loopLen int) (StructureResult, error) { + seg1Top := top[cand.aStart : cand.aStart+cand.stem1] + seg1Target := target3[cand.bStart : cand.bStart+cand.stem1] + seg2TopStart := cand.aStart + cand.stem1 + cand.gapA + seg2TargetStart := cand.bStart + cand.stem1 + cand.gapB + seg2Top := top[seg2TopStart : seg2TopStart+cand.stem2] + seg2Target := target3[seg2TargetStart : seg2TargetStart+cand.stem2] + + joinedTop := seg1Top + seg2Top + joinedTarget := seg1Target + seg2Target + res, err := stemThermo(kind, joinedTop, joinedTarget, cond, loopLen) + if err != nil { + return StructureResult{}, err + } + res.Model = StructureModelStemLoopV2 + res.StemLen = cand.stemLen() + res.SegmentCount = 2 + res.LoopLen = loopLen + + gapPenalty := structureGapPenaltyKcal(cand.gapA, cand.gapB) + danglingAdjustment := structureDanglingAdjustmentKcal(top, target3, cand) + if cand.gapA > 0 && cand.gapB > 0 { + res.InternalLoopCount = 1 + res.InternalLoopPenaltyKcal = gapPenalty + } else { + res.BulgeCount = 1 + res.BulgePenaltyKcal = gapPenalty + } + res.DanglingAdjustmentKcal = danglingAdjustment + if danglingAdjustment != 0 { + res.DanglingEndCount = 1 + } + res.LoopPenaltyKcal += gapPenalty + res.DeltaGAtAnnealKcal += gapPenalty + danglingAdjustment + denom := structureDenomCalPerK(cond) + res.TmC -= (gapPenalty + danglingAdjustment) * 1000.0 / denom + res.AnnealMarginC = res.TmC - cond.WithDefaults().AnnealC + return res, nil +} + +func structureGapPenaltyKcal(gapA, gapB int) float64 { + gapTotal := gapA + gapB + if gapTotal <= 0 { + return 0 + } + if gapA > 0 && gapB > 0 { + asymmetry := math.Abs(float64(gapA - gapB)) + return 1.4 + 0.35*math.Log(float64(gapTotal+1)) + 0.25*asymmetry + } + return 0.8 + 0.45*math.Log(float64(gapTotal+1)) +} + +func structureDanglingAdjustmentKcal(top, target3 string, cand gappedStemCandidate) float64 { + adj := 0.0 + for _, b := range unpairedStructureBases(top, cand.aStart+cand.stem1, cand.gapA) { + adj += danglingBaseAdjustmentKcal(b) + } + for _, b := range unpairedStructureBases(target3, cand.bStart+cand.stem1, cand.gapB) { + adj += danglingBaseAdjustmentKcal(b) + } + // Bound the v2 dangling-end approximation so it cannot dominate the loop cost. + if adj < -0.30 { + return -0.30 + } + return adj +} + +func unpairedStructureBases(s string, start, n int) []byte { + if n <= 0 || start < 0 || start >= len(s) { + return nil + } + end := start + n + if end > len(s) { + end = len(s) + } + return []byte(s[start:end]) +} + +func danglingBaseAdjustmentKcal(b byte) float64 { + switch b { + case 'G', 'C': + return -0.08 + case 'A', 'T': + return -0.05 + default: + return 0 + } +} + +func structureDenomCalPerK(cond Conditions) float64 { + cond = cond.WithDefaults() + denom := 200.0 + if tm, err := Tm("GCGC", "CGCG", cond.TmInput()); err == nil { + in := cond.TmInput() + candidate := tm.DS_Na + Rcal*math.Log(in.CT/float64(in.X)) + if candidate != 0 && !math.IsNaN(candidate) && !math.IsInf(candidate, 0) { + denom = math.Abs(candidate) + } + } + return denom +} + +func normalizeStructureOptions(opts StructureOptions) StructureOptions { + if opts.MinStem <= 0 { + opts.MinStem = 4 + } + if opts.MinLoop <= 0 { + opts.MinLoop = 3 + } + if opts.MaxBulge <= 0 { + opts.MaxBulge = 2 + } + if opts.MaxInternalLoop <= 0 { + opts.MaxInternalLoop = 2 + } + if opts.PartitionMaxCandidates <= 0 { + opts.PartitionMaxCandidates = 256 + } + opts.Conditions = opts.Conditions.WithDefaults() + return opts +} + +func normalizeACGTStructure(s string) (string, bool) { + out := strings.ToUpper(strings.TrimSpace(s)) + if out == "" { + return "", false + } + for i := 0; i < len(out); i++ { + switch out[i] { + case 'A', 'C', 'G', 'T': + default: + return "", false + } + } + return out, true +} + +func hairpinStemWC(seq string, leftStart, rightStart, stem int) bool { + for k := 0; k < stem; k++ { + if !wc(seq[leftStart+k], seq[rightStart+stem-1-k]) { + return false + } + } + return true +} + +func hairpinLoopPenaltyKcal(loopLen int) float64 { + if loopLen <= 0 { + return math.Inf(1) + } + return 3.0 + 0.2*math.Log(float64(loopLen)) +} + +func reverseStructure(s string) string { + b := []byte(s) + for i, j := 0, len(b)-1; i < j; i, j = i+1, j-1 { + b[i], b[j] = b[j], b[i] + } + return string(b) +} + +func betterStructureResult(cand, best StructureResult) bool { + if cand.StemLen == 0 { + return false + } + if best.StemLen == 0 { + return true + } + if cand.DeltaGAtAnnealKcal < best.DeltaGAtAnnealKcal-1e-9 { + return true + } + if math.Abs(cand.DeltaGAtAnnealKcal-best.DeltaGAtAnnealKcal) <= 1e-9 { + if cand.BothThreePrimeAnchor != best.BothThreePrimeAnchor { + return cand.BothThreePrimeAnchor + } + if cand.ThreePrimeAnchored != best.ThreePrimeAnchored { + return cand.ThreePrimeAnchored + } + return cand.StemLen > best.StemLen + } + return false +} + +func hairpinPartitionCandidates(seq5to3 string, opts StructureOptions) ([]StructureResult, error) { + opts = normalizeStructureOptions(opts) + seq, ok := normalizeACGTStructure(seq5to3) + if !ok { + return nil, errors.New("hairpin: sequence must be A/C/G/T") + } + out := make([]StructureResult, 0) + if len(seq) < 2*opts.MinStem+opts.MinLoop { + return out, nil + } + + for i := 0; i <= len(seq)-opts.MinStem-opts.MinLoop-opts.MinStem; i++ { + for j := i + opts.MinStem + opts.MinLoop; j <= len(seq)-opts.MinStem; j++ { + maxStem := len(seq) - j + if byLoop := j - i - opts.MinLoop; byLoop < maxStem { + maxStem = byLoop + } + for stem := opts.MinStem; stem <= maxStem; stem++ { + if !hairpinStemWC(seq, i, j, stem) { + continue + } + loopLen := j - (i + stem) + top := seq[i : i+stem] + target3 := reverseStructure(seq[j : j+stem]) + cand, err := stemThermo(StructureHairpin, top, target3, opts.Conditions, loopLen) + if err != nil { + continue + } + cand.AStart, cand.AEnd = i, i+stem + cand.BStart, cand.BEnd = j, j+stem + cand.ThreePrimeAnchored = cand.AEnd == len(seq) || cand.BEnd == len(seq) + out = appendPartitionCandidate(out, cand, opts) + } + } + } + + for i := 0; i < len(seq); i++ { + for j := i + opts.MinStem + opts.MinLoop; j < len(seq); j++ { + rightArm := seq[j:] + target3 := reverseStructure(rightArm) + for _, cand := range enumerateGappedStemCandidates(seq, target3, i, 0, opts) { + loopLen := j - cand.aEnd() + if loopLen < opts.MinLoop || cand.bEnd() > len(target3) { + continue + } + res, err := gappedStemThermo(StructureHairpin, seq, target3, cand, opts.Conditions, loopLen) + if err != nil { + continue + } + res.AStart, res.AEnd = cand.aStart, cand.aEnd() + res.BStart, res.BEnd = j+len(rightArm)-cand.bEnd(), j+len(rightArm)-cand.bStart + res.ThreePrimeAnchored = res.AEnd == len(seq) || res.BEnd == len(seq) + out = appendPartitionCandidate(out, res, opts) + } + } + } + return out, nil +} + +func bestDimerPartition(a5to3, b5to3 string, opts StructureOptions, kind string) (StructureResult, bool, error) { + candidates, err := dimerPartitionCandidates(a5to3, b5to3, opts, kind) + if err != nil { + return StructureResult{}, false, err + } + return partitionStructureEnsemble(candidates, normalizeStructureOptions(opts).Conditions) +} + +func dimerPartitionCandidates(a5to3, b5to3 string, opts StructureOptions, kind string) ([]StructureResult, error) { + opts = normalizeStructureOptions(opts) + a, okA := normalizeACGTStructure(a5to3) + b, okB := normalizeACGTStructure(b5to3) + if !okA || !okB { + return nil, errors.New("dimer: sequences must be A/C/G/T") + } + out := make([]StructureResult, 0) + if len(a) < opts.MinStem || len(b) < opts.MinStem { + return out, nil + } + + bTarget3 := reverseStructure(b) + for i := 0; i <= len(a)-opts.MinStem; i++ { + for j := 0; j <= len(bTarget3)-opts.MinStem; j++ { + run := 0 + for i+run < len(a) && j+run < len(bTarget3) && wc(a[i+run], bTarget3[j+run]) { + run++ + } + if run < opts.MinStem { + continue + } + for stem := opts.MinStem; stem <= run; stem++ { + cand, err := stemThermo(kind, a[i:i+stem], bTarget3[j:j+stem], opts.Conditions, 0) + if err != nil { + continue + } + cand.AStart, cand.AEnd = i, i+stem + cand.BStart, cand.BEnd = len(b)-(j+stem), len(b)-j + a3 := cand.AEnd == len(a) + b3 := cand.BEnd == len(b) + cand.ThreePrimeAnchored = a3 || b3 + cand.BothThreePrimeAnchor = a3 && b3 + out = appendPartitionCandidate(out, cand, opts) + } + } + } + + for i := 0; i < len(a); i++ { + for j := 0; j < len(bTarget3); j++ { + for _, cand := range enumerateGappedStemCandidates(a, bTarget3, i, j, opts) { + if cand.aEnd() > len(a) || cand.bEnd() > len(bTarget3) { + continue + } + res, err := gappedStemThermo(kind, a, bTarget3, cand, opts.Conditions, 0) + if err != nil { + continue + } + res.AStart, res.AEnd = cand.aStart, cand.aEnd() + res.BStart, res.BEnd = len(b)-cand.bEnd(), len(b)-cand.bStart + a3 := res.AEnd == len(a) + b3 := res.BEnd == len(b) + res.ThreePrimeAnchored = a3 || b3 + res.BothThreePrimeAnchor = a3 && b3 + out = appendPartitionCandidate(out, res, opts) + } + } + } + return out, nil +} + +func appendPartitionCandidate(out []StructureResult, cand StructureResult, opts StructureOptions) []StructureResult { + if cand.StemLen == 0 || math.IsNaN(cand.DeltaGAtAnnealKcal) || math.IsInf(cand.DeltaGAtAnnealKcal, 0) { + return out + } + out = append(out, cand) + if opts.PartitionMaxCandidates > 0 && len(out) >= opts.PartitionMaxCandidates { + return out[:opts.PartitionMaxCandidates] + } + return out +} + +func partitionStructureEnsemble(candidates []StructureResult, cond Conditions) (StructureResult, bool, error) { + if len(candidates) == 0 { + return StructureResult{}, false, nil + } + cond = cond.WithDefaults() + best := candidates[0] + minDG := candidates[0].DeltaGAtAnnealKcal + for _, cand := range candidates[1:] { + if betterStructureResult(cand, best) { + best = cand + } + if cand.DeltaGAtAnnealKcal < minDG { + minDG = cand.DeltaGAtAnnealKcal + } + } + + rt := (Rcal / 1000.0) * (cond.AnnealC + 273.15) + if rt <= 0 || math.IsNaN(rt) || math.IsInf(rt, 0) { + rt = (Rcal / 1000.0) * 333.15 + } + partition := 0.0 + bestWeightNumerator := 0.0 + for _, cand := range candidates { + w := math.Exp(-(cand.DeltaGAtAnnealKcal - minDG) / rt) + partition += w + if sameStructureCoordinates(cand, best) && cand.DeltaGAtAnnealKcal == best.DeltaGAtAnnealKcal { + bestWeightNumerator += w + } + } + if partition <= 0 || math.IsNaN(partition) || math.IsInf(partition, 0) { + return best, true, nil + } + ensembleDG := minDG - rt*math.Log(partition) + best.Model = StructureModelPartitionV1 + best.EnsembleDeltaGAtAnnealKcal = ensembleDG + best.PartitionFunction = partition + best.EnsembleWeight = bestWeightNumerator / partition + best.EnsembleCandidateCount = len(candidates) + best.DeltaGAtAnnealKcal = ensembleDG + best.AnnealMarginC = best.TmC - cond.AnnealC + return best, true, nil +} + +func sameStructureCoordinates(a, b StructureResult) bool { + return a.Kind == b.Kind && a.AStart == b.AStart && a.AEnd == b.AEnd && a.BStart == b.BStart && a.BEnd == b.BEnd && a.StemLen == b.StemLen && a.SegmentCount == b.SegmentCount +} + +type hairpinDPPartitionResult struct { + cellCount int + stateCount int + partitionFunction float64 + expectedPairs float64 + mfeDeltaGAtAnnealKcal float64 + ensembleDeltaGAtAnnealKcal float64 +} + +type hairpinDPCell struct { + z float64 + pairMass float64 + mfe float64 + states int +} + +func hairpinDPPartition(seq string, opts StructureOptions) hairpinDPPartitionResult { + n := len(seq) + if n == 0 { + return hairpinDPPartitionResult{} + } + cond := opts.Conditions.WithDefaults() + tempK := cond.AnnealC + 273.15 + rt := (Rcal / 1000.0) * tempK + if rt <= 0 || math.IsNaN(rt) || math.IsInf(rt, 0) { + rt = (Rcal / 1000.0) * 333.15 + } + + cells := make([][]hairpinDPCell, n) + for i := range cells { + cells[i] = make([]hairpinDPCell, n) + for j := range cells[i] { + cells[i][j] = hairpinDPCell{z: 1, mfe: 0} + } + } + + cellCount := 0 + for span := 1; span < n; span++ { + for i := 0; i+span < n; i++ { + j := i + span + cellCount++ + z := dpZ(cells, i, j-1) + pairMass := dpPairMass(cells, i, j-1) + mfe := dpMFE(cells, i, j-1) + states := dpStates(cells, i, j-1) + + if j-i > opts.MinLoop { + for k := i; k <= j-opts.MinLoop-1; k++ { + if !wc(seq[k], seq[j]) { + continue + } + bpDG := hairpinDPBasePairDeltaGKcal(seq[k], seq[j]) + w := structureBoltzmannWeight(bpDG, tempK) + leftZ := dpZ(cells, i, k-1) + insideZ := dpZ(cells, k+1, j-1) + termZ := w * leftZ * insideZ + if termZ <= 0 || math.IsNaN(termZ) || math.IsInf(termZ, 0) { + continue + } + z += termZ + pairMass += termZ * (1 + dpPairMass(cells, i, k-1)/leftZ + dpPairMass(cells, k+1, j-1)/insideZ) + candidateMFE := bpDG + dpMFE(cells, i, k-1) + dpMFE(cells, k+1, j-1) + if candidateMFE < mfe { + mfe = candidateMFE + } + states += 1 + dpStates(cells, i, k-1) + dpStates(cells, k+1, j-1) + } + } + cells[i][j] = hairpinDPCell{z: z, pairMass: pairMass, mfe: mfe, states: states} + } + } + + z := dpZ(cells, 0, n-1) + out := hairpinDPPartitionResult{ + cellCount: cellCount, + stateCount: dpStates(cells, 0, n-1), + partitionFunction: z, + expectedPairs: dpPairMass(cells, 0, n-1) / z, + mfeDeltaGAtAnnealKcal: dpMFE(cells, 0, n-1), + } + if z > 1 { + out.ensembleDeltaGAtAnnealKcal = -rt * math.Log(z) + } + return out +} + +func mergeHairpinDPEnsemble(dst *StructureResult, dp hairpinDPPartitionResult) { + if dst == nil || dp.partitionFunction <= 1 || dp.stateCount <= 0 { + return + } + dst.DPCellCount = dp.cellCount + dst.DPStateCount = dp.stateCount + dst.DPExpectedPairs = dp.expectedPairs + dst.DPMFEDeltaGAtAnnealKcal = dp.mfeDeltaGAtAnnealKcal + dst.DPEnsembleDeltaGAtAnnealKcal = dp.ensembleDeltaGAtAnnealKcal + if dp.ensembleDeltaGAtAnnealKcal != 0 && dp.ensembleDeltaGAtAnnealKcal < dst.EnsembleDeltaGAtAnnealKcal { + dst.EnsembleDeltaGAtAnnealKcal = dp.ensembleDeltaGAtAnnealKcal + dst.DeltaGAtAnnealKcal = dp.ensembleDeltaGAtAnnealKcal + } +} + +func dpZ(cells [][]hairpinDPCell, i, j int) float64 { + if i > j || i < 0 || j < 0 || i >= len(cells) || j >= len(cells) { + return 1 + } + z := cells[i][j].z + if z <= 0 || math.IsNaN(z) || math.IsInf(z, 0) { + return 1 + } + return z +} + +func dpPairMass(cells [][]hairpinDPCell, i, j int) float64 { + if i > j || i < 0 || j < 0 || i >= len(cells) || j >= len(cells) { + return 0 + } + return cells[i][j].pairMass +} + +func dpMFE(cells [][]hairpinDPCell, i, j int) float64 { + if i > j || i < 0 || j < 0 || i >= len(cells) || j >= len(cells) { + return 0 + } + return cells[i][j].mfe +} + +func dpStates(cells [][]hairpinDPCell, i, j int) int { + if i > j || i < 0 || j < 0 || i >= len(cells) || j >= len(cells) { + return 0 + } + return cells[i][j].states +} + +func structureBoltzmannWeight(deltaGKcal, tempK float64) float64 { + if tempK <= 0 || math.IsNaN(deltaGKcal) || math.IsInf(deltaGKcal, 0) { + return 0 + } + rt := (Rcal / 1000.0) * tempK + if rt <= 0 { + return 0 + } + x := -deltaGKcal / rt + if x > 700 { + x = 700 + } + if x < -700 { + x = -700 + } + return math.Exp(x) +} + +func hairpinDPBasePairDeltaGKcal(a, b byte) float64 { + if !wc(a, b) { + return math.Inf(1) + } + if (a == 'G' && b == 'C') || (a == 'C' && b == 'G') { + return -2.2 + } + return -1.1 +} diff --git a/core/thermo/structure_test.go b/core/thermo/structure_test.go new file mode 100644 index 0000000..c8c9ce5 --- /dev/null +++ b/core/thermo/structure_test.go @@ -0,0 +1,187 @@ +package thermo + +import "testing" + +func TestBestHairpinFindsNearestNeighborStem(t *testing.T) { + got, ok, err := BestHairpin("GCGCGCAAAAGCGCGC", DefaultStructureOptions(DefaultConditions())) + if err != nil { + t.Fatalf("BestHairpin error: %v", err) + } + if !ok { + t.Fatal("expected hairpin") + } + if got.Kind != StructureHairpin || got.StemLen < 4 || got.LoopLen < 3 { + t.Fatalf("unexpected hairpin result: %+v", got) + } +} + +func TestBestHairpinRejectsNoStemSequence(t *testing.T) { + got, ok, err := BestHairpin("AAAAAAAAAAAA", DefaultStructureOptions(DefaultConditions())) + if err != nil { + t.Fatalf("BestHairpin error: %v", err) + } + if ok { + t.Fatalf("unexpected hairpin: %+v", got) + } +} + +func TestBestSelfDimerReportsThreePrimeAnchoring(t *testing.T) { + got, ok, err := BestSelfDimer("AAAAAGCGC", DefaultStructureOptions(DefaultConditions())) + if err != nil { + t.Fatalf("BestSelfDimer error: %v", err) + } + if !ok { + t.Fatal("expected self-dimer") + } + if got.Kind != StructureSelfDimer || got.StemLen < 4 { + t.Fatalf("unexpected self-dimer: %+v", got) + } + if !got.ThreePrimeAnchored { + t.Fatalf("expected 3' anchoring, got %+v", got) + } +} + +func TestBestCrossDimerIsSymmetricInEnergy(t *testing.T) { + opts := DefaultStructureOptions(DefaultConditions()) + ab, okAB, errAB := BestCrossDimer("TTTTTGCGC", "AAAAAGCGC", opts) + ba, okBA, errBA := BestCrossDimer("AAAAAGCGC", "TTTTTGCGC", opts) + if errAB != nil || errBA != nil { + t.Fatalf("unexpected errors: %v %v", errAB, errBA) + } + if !okAB || !okBA { + t.Fatalf("expected both orientations to find dimers: ab=%+v ba=%+v", ab, ba) + } + if diff := ab.DeltaGAtAnnealKcal - ba.DeltaGAtAnnealKcal; diff < -1e-9 || diff > 1e-9 { + t.Fatalf("expected symmetric dimer energy, got %g vs %g", ab.DeltaGAtAnnealKcal, ba.DeltaGAtAnnealKcal) + } +} + +func TestBestCrossDimerV2ReportsBulgeCandidate(t *testing.T) { + opts := DefaultStructureOptions(DefaultConditions()) + got, ok, err := BestCrossDimerV2("GCGCAAGCGC", "GCGCGCGC", opts) + if err != nil { + t.Fatalf("BestCrossDimerV2 error: %v", err) + } + if !ok { + t.Fatal("expected gapped cross-dimer") + } + if got.Model != StructureModelStemLoopV2 { + t.Fatalf("expected v2 model, got %+v", got) + } + if got.BulgeCount != 1 || got.InternalLoopCount != 0 || got.SegmentCount != 2 { + t.Fatalf("expected one bulge in two-segment structure, got %+v", got) + } + if got.StemLen < 8 || got.BulgePenaltyKcal <= 0 { + t.Fatalf("unexpected gapped-stem details: %+v", got) + } +} + +func TestBestHairpinV2ReportsInternalLoopCandidate(t *testing.T) { + opts := DefaultStructureOptions(DefaultConditions()) + got, ok, err := bestHairpinGapped("GCGCAAGCGCTTTTGCGCTTGCGC", opts) + if err != nil { + t.Fatalf("bestHairpinGapped error: %v", err) + } + if !ok { + t.Fatal("expected gapped hairpin") + } + if got.Model != StructureModelStemLoopV2 { + t.Fatalf("expected v2 model, got %+v", got) + } + if got.InternalLoopCount != 1 || got.SegmentCount != 2 { + t.Fatalf("expected one internal-loop two-segment hairpin, got %+v", got) + } + if got.InternalLoopPenaltyKcal <= 0 || got.LoopPenaltyKcal <= 0 { + t.Fatalf("expected loop penalties, got %+v", got) + } +} + +func TestBestHairpinV2FindsBulgedStem(t *testing.T) { + opts := DefaultStructureOptions(DefaultConditions()) + got, ok, err := BestHairpinV2("GCGCGCAAAAGCGTCGC", opts) + if err != nil { + t.Fatalf("BestHairpinV2 error: %v", err) + } + if !ok { + t.Fatal("expected v2 bulged hairpin candidate") + } + if got.Model != StructureModelStemLoopV2 || got.BulgeCount+got.InternalLoopCount == 0 { + t.Fatalf("expected v2 gapped hairpin metadata, got %+v", got) + } +} + +func TestBestCrossDimerV2FindsBulgedDimer(t *testing.T) { + opts := DefaultStructureOptions(DefaultConditions()) + got, ok, err := BestCrossDimerV2("TTTTTGCGCG", "AAAAAGCGTCG", opts) + if err != nil { + t.Fatalf("BestCrossDimerV2 error: %v", err) + } + if !ok { + t.Fatal("expected v2 bulged cross-dimer candidate") + } + if got.Model != StructureModelStemLoopV2 || got.StemLen < opts.MinStem || got.BulgeCount+got.InternalLoopCount == 0 { + t.Fatalf("unexpected v2 cross-dimer: %+v", got) + } +} + +func TestBestStructureV2PreservesContiguousResult(t *testing.T) { + opts := DefaultStructureOptions(DefaultConditions()) + v1, ok1, err1 := BestCrossDimer("TTTTTGCGC", "AAAAAGCGC", opts) + v2, ok2, err2 := BestCrossDimerV2("TTTTTGCGC", "AAAAAGCGC", opts) + if err1 != nil || err2 != nil { + t.Fatalf("unexpected errors: %v %v", err1, err2) + } + if !ok1 || !ok2 { + t.Fatalf("expected both v1 and v2 candidates: v1=%+v v2=%+v", v1, v2) + } + if v2.DeltaGAtAnnealKcal > v1.DeltaGAtAnnealKcal+1e-9 { + t.Fatalf("v2 should preserve or improve v1 result: v1=%+v v2=%+v", v1, v2) + } +} + +func TestBestCrossDimerPartitionReportsEnsembleAndDPMetadata(t *testing.T) { + opts := DefaultStructureOptions(DefaultConditions()) + got, ok, err := BestCrossDimerPartition("TTTTTGCGCG", "AAAAAGCGTCG", opts) + if err != nil { + t.Fatalf("BestCrossDimerPartition error: %v", err) + } + if !ok { + t.Fatal("expected partition cross-dimer candidate") + } + if got.Model != StructureModelPartitionV1 { + t.Fatalf("expected partition model, got %+v", got) + } + if got.EnsembleCandidateCount <= 0 || got.PartitionFunction <= 0 || got.EnsembleWeight <= 0 || got.EnsembleWeight > 1 { + t.Fatalf("invalid ensemble metadata: %+v", got) + } + if got.EnsembleDeltaGAtAnnealKcal == 0 || got.DeltaGAtAnnealKcal != got.EnsembleDeltaGAtAnnealKcal { + t.Fatalf("expected ensemble ΔG to be represented in result, got %+v", got) + } +} + +func TestBestHairpinPartitionAddsNonCrossingDPEnsemble(t *testing.T) { + opts := DefaultStructureOptions(DefaultConditions()) + v2, ok2, err2 := BestHairpinV2("GCGCGCAAAAGCGTCGC", opts) + part, okPart, errPart := BestHairpinPartition("GCGCGCAAAAGCGTCGC", opts) + if err2 != nil || errPart != nil { + t.Fatalf("unexpected errors: v2=%v partition=%v", err2, errPart) + } + if !ok2 || !okPart { + t.Fatalf("expected both v2 and partition candidates: v2=%+v part=%+v", v2, part) + } + if part.Model != StructureModelPartitionV1 { + t.Fatalf("expected partition model, got %+v", part) + } + if part.EnsembleCandidateCount <= 0 || part.PartitionFunction <= 0 || part.EnsembleWeight <= 0 { + t.Fatalf("missing candidate ensemble metadata: %+v", part) + } + if part.DPCellCount <= 0 || part.DPStateCount <= 0 || part.DPExpectedPairs <= 0 { + t.Fatalf("missing DP partition metadata: %+v", part) + } + if part.DPEnsembleDeltaGAtAnnealKcal >= 0 || part.DPMFEDeltaGAtAnnealKcal >= 0 { + t.Fatalf("expected favorable DP ensemble/MFE terms: %+v", part) + } + if part.DeltaGAtAnnealKcal > v2.DeltaGAtAnnealKcal+1e-9 { + t.Fatalf("partition ensemble should not be weaker than v2 candidate: v2=%+v part=%+v", v2, part) + } +} diff --git a/core/thermo/terminal_mismatch_params.go b/core/thermo/terminal_mismatch_params.go new file mode 100644 index 0000000..5047db7 --- /dev/null +++ b/core/thermo/terminal_mismatch_params.go @@ -0,0 +1,210 @@ +package thermo + +const ( + // TerminalMismatchParameterSetHeuristicV1 identifies the current ipcr terminal + // mismatch penalty model. It is an empirical fixed-ΔTm fallback, not a + // literature-backed nearest-neighbor thermodynamic table. + TerminalMismatchParameterSetHeuristicV1 = "ipcr-terminal-mismatch-heuristic-v1" + + // TerminalMismatchSourceHeuristicPenalty labels terminal mismatch terms that + // come from the built-in side-specific fallback penalties. + TerminalMismatchSourceHeuristicPenalty = "ipcr-terminal-mismatch-heuristic" + + // TerminalMismatchPrimer5Prime identifies a mismatch at the primer 5' base. + TerminalMismatchPrimer5Prime byte = '5' + + // TerminalMismatchPrimer3Prime identifies a mismatch at the primer 3' base. + TerminalMismatchPrimer3Prime byte = '3' +) + +const ( + terminalMismatchCitationHeuristicV1 = "ipcr internal terminal-mismatch heuristic; no peer-reviewed sequence-context terminal-mismatch thermodynamic table is applied" + terminalMismatchNoteHeuristicV1 = "Empirical fixed terminal mismatch ΔTm penalty preserved from the ipcr imperfect-duplex model; this is not a sequence-context nearest-neighbor thermodynamic parameter." +) + +// TerminalMismatchKey identifies a single primer-terminal mismatch plus the +// adjacent inward duplex column. Primer and target are in the same orientation +// used by ImperfectDuplex: primer is 5'→3' and target is 3'→5'. +// +// PrimerEnd is '5' or '3' relative to the primer. P and T are the terminal +// primer and target bases at the mismatch column. PNeighbor and TNeighbor are +// the inward adjacent bases, or 'N' when the context is unavailable. +type TerminalMismatchKey struct { + PrimerEnd byte + P byte + T byte + PNeighbor byte + TNeighbor byte +} + +// TerminalMismatchParameter stores one terminal mismatch scoring term. A +// parameter may be expressed as a ΔTm penalty, a ΔΔG37 penalty, or both. The +// current built-in fallback uses only DeltaTmC. +type TerminalMismatchParameter struct { + Key TerminalMismatchKey + DeltaTmC float64 + DeltaDeltaG37kcal float64 + HasDeltaTm bool + HasDeltaDeltaG37 bool + Source string + ParameterSet string + Citation string + Note string +} + +var terminalMismatchParametersByKey = map[TerminalMismatchKey]TerminalMismatchParameter{} + +// CuratedTerminalMismatchParameters is intentionally empty until a verified +// sequence-context terminal-mismatch thermodynamic table is added. The lookup +// layer is still exposed so scoring/reporting code can distinguish future +// table-backed terms from the current heuristic fallback. +var CuratedTerminalMismatchParameters = []TerminalMismatchParameter{} + +func init() { + for _, p := range CuratedTerminalMismatchParameters { + p.Key = normalizeTerminalMismatchKey(p.Key) + terminalMismatchParametersByKey[p.Key] = p + } +} + +// LookupTerminalMismatchParameter returns a curated terminal mismatch parameter +// for an exact key or for a key with wildcarded inward-neighbor context. 'N' is +// treated as a wildcard only for PNeighbor/TNeighbor, not for the central +// mismatch bases. +func LookupTerminalMismatchParameter(key TerminalMismatchKey) (TerminalMismatchParameter, bool) { + key = normalizeTerminalMismatchKey(key) + if !isTerminalMismatchKeyUsable(key, false) { + return TerminalMismatchParameter{}, false + } + for _, candidate := range terminalMismatchLookupCandidates(key) { + if p, ok := terminalMismatchParametersByKey[candidate]; ok { + return p, true + } + } + return TerminalMismatchParameter{}, false +} + +// LookupTerminalMismatchParameterWithFallback returns a curated parameter when +// available; otherwise it returns the named ipcr heuristic parameter that matches +// the side-specific terminal mismatch penalty in ImperfectDuplexOptions. +func LookupTerminalMismatchParameterWithFallback(key TerminalMismatchKey, opts ImperfectDuplexOptions) (TerminalMismatchParameter, bool) { + if p, ok := LookupTerminalMismatchParameter(key); ok { + return p, true + } + return LookupTerminalMismatchHeuristicParameter(key, opts) +} + +// LookupTerminalMismatchHeuristicParameter returns the current side-specific +// fixed-ΔTm terminal mismatch fallback as a named parameter. +func LookupTerminalMismatchHeuristicParameter(key TerminalMismatchKey, opts ImperfectDuplexOptions) (TerminalMismatchParameter, bool) { + key = normalizeTerminalMismatchKey(key) + if !isTerminalMismatchKeyUsable(key, true) { + return TerminalMismatchParameter{}, false + } + + opts = opts.withDefaults() + penalty := 0.0 + switch key.PrimerEnd { + case TerminalMismatchPrimer5Prime: + penalty = opts.FivePrimeTerminalPenaltyC + case TerminalMismatchPrimer3Prime: + penalty = opts.ThreePrimeTerminalPenaltyC + default: + return TerminalMismatchParameter{}, false + } + if penalty <= 0 { + return TerminalMismatchParameter{}, false + } + + return TerminalMismatchParameter{ + Key: key, + DeltaTmC: penalty, + HasDeltaTm: true, + Source: TerminalMismatchSourceHeuristicPenalty, + ParameterSet: TerminalMismatchParameterSetHeuristicV1, + Citation: terminalMismatchCitationHeuristicV1, + Note: terminalMismatchNoteHeuristicV1, + }, true +} + +// TerminalMismatchKeyForPosition builds a terminal mismatch key from primer- +// aligned sequences. It returns false unless pos is a terminal column containing +// a non-Watson-Crick primer/target pair. +func TerminalMismatchKeyForPosition(primer5to3, target3to5 string, pos int) (TerminalMismatchKey, bool) { + if len(primer5to3) == 0 || len(primer5to3) != len(target3to5) || pos < 0 || pos >= len(primer5to3) { + return TerminalMismatchKey{}, false + } + p := normalizeBase(primer5to3[pos]) + t := normalizeBase(target3to5[pos]) + if !isACGT(p) || !isNT(t) || wc(p, t) { + return TerminalMismatchKey{}, false + } + + switch pos { + case len(primer5to3) - 1: + return normalizeTerminalMismatchKey(TerminalMismatchKey{ + PrimerEnd: TerminalMismatchPrimer3Prime, + P: p, + T: t, + PNeighbor: mismatchAt(primer5to3, pos-1), + TNeighbor: mismatchAt(target3to5, pos-1), + }), true + case 0: + return normalizeTerminalMismatchKey(TerminalMismatchKey{ + PrimerEnd: TerminalMismatchPrimer5Prime, + P: p, + T: t, + PNeighbor: mismatchAt(primer5to3, 1), + TNeighbor: mismatchAt(target3to5, 1), + }), true + default: + return TerminalMismatchKey{}, false + } +} + +func terminalMismatchLookupCandidates(key TerminalMismatchKey) []TerminalMismatchKey { + key = normalizeTerminalMismatchKey(key) + return []TerminalMismatchKey{ + key, + {PrimerEnd: key.PrimerEnd, P: key.P, T: key.T, PNeighbor: key.PNeighbor, TNeighbor: 'N'}, + {PrimerEnd: key.PrimerEnd, P: key.P, T: key.T, PNeighbor: 'N', TNeighbor: key.TNeighbor}, + {PrimerEnd: key.PrimerEnd, P: key.P, T: key.T, PNeighbor: 'N', TNeighbor: 'N'}, + } +} + +func normalizeTerminalMismatchKey(key TerminalMismatchKey) TerminalMismatchKey { + key.PrimerEnd = normalizeTerminalMismatchEnd(key.PrimerEnd) + key.P = normalizeBase(key.P) + key.T = normalizeBase(key.T) + key.PNeighbor = normalizeBase(key.PNeighbor) + key.TNeighbor = normalizeBase(key.TNeighbor) + return key +} + +func normalizeTerminalMismatchEnd(end byte) byte { + switch end { + case TerminalMismatchPrimer5Prime: + return TerminalMismatchPrimer5Prime + case TerminalMismatchPrimer3Prime: + return TerminalMismatchPrimer3Prime + default: + return 0 + } +} + +func isTerminalMismatchKeyUsable(key TerminalMismatchKey, allowUnknownTarget bool) bool { + if key.PrimerEnd != TerminalMismatchPrimer5Prime && key.PrimerEnd != TerminalMismatchPrimer3Prime { + return false + } + if !isACGT(key.P) { + return false + } + if allowUnknownTarget { + if !isNT(key.T) { + return false + } + } else if !isACGT(key.T) { + return false + } + return !wc(key.P, key.T) +} diff --git a/core/thermo/testdata/dangling_end_context_goldens.golden b/core/thermo/testdata/dangling_end_context_goldens.golden new file mode 100644 index 0000000..fa6ae42 --- /dev/null +++ b/core/thermo/testdata/dangling_end_context_goldens.golden @@ -0,0 +1,6 @@ +id primer target3to5 five_prime_base three_prime_base anneal_c na_m mg_m dntp_m primer_total_m salt_model expected_dangling_count expected_delta_g_kcal tolerance_delta_g expected_tm_direction expected_parameter_set source_id note +threeprime_target5p_GA_T ACGTACGTACGTACGT TGCATGCATGCATGCA G 37 1.0 0.0 0.0 2.5e-7 monovalent 1 -0.62 1e-12 increase santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 Target 5' G dangling end next to terminal T/A pair; Table 3 motif GA/T +threeprime_target5p_GT_A_unfavorable ACGTACGTACGTACGA TGCATGCATGCATGCT G 37 1.0 0.0 0.0 2.5e-7 monovalent 1 0.48 1e-12 decrease santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 Target 5' G dangling end next to terminal A/T pair; Table 3 motif GT/A is unfavorable +fiveprime_target3p_TC_A ACGTACGTACGTACGT TGCATGCATGCATGCA C 37 1.0 0.0 0.0 2.5e-7 monovalent 1 -0.19 1e-12 increase santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 Target 3' C dangling end next to terminal A/T pair; Table 3 motif TC/A +both_target_dangles ACGTACGTACGTACGT TGCATGCATGCATGCA C G 37 1.0 0.0 0.0 2.5e-7 monovalent 2 -0.81 1e-12 increase santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 Both target/template dangling ends are summed +fiveprime_target3p_AC_T_unfavorable TCGTACGTACGTACGT AGCATGCATGCATGCA C 37 1.0 0.0 0.0 2.5e-7 monovalent 1 0.28 1e-12 decrease santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 Target 3' C dangling end next to terminal T/A pair; Table 3 motif AC/T is unfavorable diff --git a/core/thermo/testdata/dangling_end_goldens.golden b/core/thermo/testdata/dangling_end_goldens.golden new file mode 100644 index 0000000..16fa6aa --- /dev/null +++ b/core/thermo/testdata/dangling_end_goldens.golden @@ -0,0 +1,33 @@ +id template_end dangling_base terminal_primer_base terminal_target_base expected_delta_h_kcal expected_delta_s_cal_k expected_delta_g37_kcal tolerance expected_parameter_set source_id note +template_5p_A_TA 5p A T A 0.20 2.289215 -0.51 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif AA/T +template_5p_C_TA 5p C T A 0.60 3.288731 -0.42 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif CA/T +template_5p_G_TA 5p G T A -1.10 -1.547638 -0.62 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif GA/T +template_5p_T_TA 5p T T A -6.90 -19.958085 -0.71 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif TA/T +template_5p_A_GC 5p A G C -6.30 -17.217475 -0.96 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif AC/G +template_5p_C_GC 5p C G C -4.40 -12.510076 -0.52 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif CC/G +template_5p_G_GC 5p G G C -5.10 -14.122199 -0.72 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif GC/G +template_5p_T_GC 5p T G C -4.00 -11.026922 -0.58 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif TC/G +template_5p_A_CG 5p A C G -3.70 -10.059649 -0.58 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif AG/C +template_5p_C_CG 5p C C G -4.00 -11.800742 -0.34 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif CG/C +template_5p_G_CG 5p G C G -3.90 -10.768983 -0.56 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif GG/C +template_5p_T_CG 5p T C G -4.90 -13.832017 -0.61 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif TG/C +template_5p_A_AT 5p A A T -2.90 -7.738191 -0.50 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif AT/A +template_5p_C_AT 5p C A T -4.10 -13.154925 -0.02 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif CT/A +template_5p_G_AT 5p G A T -4.20 -15.089473 0.48 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif GT/A +template_5p_T_AT 5p T A T -0.20 -0.322425 -0.10 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 5p dangling motif TT/A +template_3p_A_TA 3p A T A -0.50 -1.225214 -0.12 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif AA/T +template_3p_C_TA 3p C T A 4.70 14.251169 0.28 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif AC/T +template_3p_G_TA 3p G T A -4.10 -13.187167 -0.01 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif AG/T +template_3p_T_TA 3p T T A -3.80 -12.671288 0.13 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif AT/T +template_3p_A_GC 3p A G C -5.90 -16.379171 -0.82 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif CA/G +template_3p_C_GC 3p C G C -2.60 -7.383524 -0.31 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif CC/G +template_3p_G_GC 3p G G C -3.20 -10.285346 -0.01 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif CG/G +template_3p_T_GC 3p T G C -5.20 -15.089473 -0.52 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif CT/G +template_3p_A_CG 3p A C G -2.10 -3.804611 -0.92 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif GA/C +template_3p_C_CG 3p C C G -0.20 0.096727 -0.23 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif GC/C +template_3p_G_CG 3p G C G -3.90 -11.155892 -0.44 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif GG/C +template_3p_T_CG 3p T C G -4.40 -13.058198 -0.35 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif GT/C +template_3p_A_AT 3p A A T -0.70 -0.709334 -0.48 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif TA/A +template_3p_C_AT 3p C A T 4.40 14.799291 -0.19 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif TC/A +template_3p_G_AT 3p G A T -1.60 -3.546671 -0.50 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif TG/A +template_3p_T_AT 3p T A T 2.90 10.285346 -0.29 0.00001 santalucia-hicks-2004-dna-dangling-ends-v1 santalucia-hicks-2004-table-3 SantaLucia-Hicks 2004 Table 3 3p dangling motif TT/A diff --git a/core/thermo/testdata/mismatch_goldens.golden b/core/thermo/testdata/mismatch_goldens.golden new file mode 100644 index 0000000..f27511a --- /dev/null +++ b/core/thermo/testdata/mismatch_goldens.golden @@ -0,0 +1,2 @@ +id primer target3to5 anneal_c na_m mg_m dntp_m primer_total_m salt_model tm_c mismatch_penalty_c dg_penalty_kcal mismatch_count five_prime_count three_prime_count terminal_count fallback_count triplet_count policy tolerance +internal1 ACGTACGTACGTACGT TGCATGCCTGCATGCA 60 0.05 0.003 0 2.5e-7 monovalent 39.7251567119435 10.1289600599072 3.97 1 0 0 0 0 1 nn-imperfect-v1-with-triplet-ddg 1e-9 diff --git a/core/thermo/testdata/mismatch_triplet_goldens.golden b/core/thermo/testdata/mismatch_triplet_goldens.golden new file mode 100644 index 0000000..c8c6543 --- /dev/null +++ b/core/thermo/testdata/mismatch_triplet_goldens.golden @@ -0,0 +1,193 @@ +id primer target anneal_c na_m mg_m dntp_m salt_model expected_mismatch_count expected_triplet_count expected_fallback_count expected_parameter_set expected_delta_delta_g_kcal tolerance_delta_g expected_tm_direction source_id note +aa_aaa_tat CCAAACC GGTATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.30 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in A/T and A/T flanks +aa_aac_tag CCAACCC GGTAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.22 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in A/T and C/G flanks +aa_aag_tac CCAAGCC GGTACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.32 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in A/T and G/C flanks +aa_aat_taa CCAATCC GGTAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.10 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in A/T and T/A flanks +aa_caa_gat CCCAACC GGGATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.57 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in C/G and A/T flanks +aa_cac_gag CCCACCC GGGAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.49 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in C/G and C/G flanks +aa_cag_gac CCCAGCC GGGACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.59 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in C/G and G/C flanks +aa_cat_gaa CCCATCC GGGAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.37 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in C/G and T/A flanks +aa_gaa_cat CCGAACC GGCATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.16 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in G/C and A/T flanks +aa_gac_cag CCGACCC GGCAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.08 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in G/C and C/G flanks +aa_gag_cac CCGAGCC GGCACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.18 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in G/C and G/C flanks +aa_gat_caa CCGATCC GGCAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.96 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in G/C and T/A flanks +aa_taa_aat CCTAACC GGAATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.96 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in T/A and A/T flanks +aa_tac_aag CCTACCC GGAAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.88 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in T/A and C/G flanks +aa_tag_aac CCTAGCC GGAACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.98 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in T/A and G/C flanks +aa_tat_aaa CCTATCC GGAAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.76 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/A in T/A and T/A flanks +ac_aaa_tct CCAAACC GGTCTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.21 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in A/T and A/T flanks +ac_aac_tcg CCAACCC GGTCGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.79 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in A/T and C/G flanks +ac_aag_tcc CCAAGCC GGTCCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.95 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in A/T and G/C flanks +ac_aat_tca CCAATCC GGTCAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.53 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in A/T and T/A flanks +ac_caa_gct CCCAACC GGGCTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.53 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in C/G and A/T flanks +ac_cac_gcg CCCACCC GGGCGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.11 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in C/G and C/G flanks +ac_cag_gcc CCCAGCC GGGCCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.27 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in C/G and G/C flanks +ac_cat_gca CCCATCC GGGCAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.85 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in C/G and T/A flanks +ac_gaa_cct CCGAACC GGCCTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.44 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in G/C and A/T flanks +ac_gac_ccg CCGACCC GGCCGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.02 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in G/C and C/G flanks +ac_gag_ccc CCGAGCC GGCCCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.18 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in G/C and G/C flanks +ac_gat_cca CCGATCC GGCCAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.76 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in G/C and T/A flanks +ac_taa_act CCTAACC GGACTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.83 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in T/A and A/T flanks +ac_tac_acg CCTACCC GGACGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.41 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in T/A and C/G flanks +ac_tag_acc CCTAGCC GGACCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.57 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in T/A and G/C flanks +ac_tat_aca CCTATCC GGACAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.15 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/C in T/A and T/A flanks +ag_aaa_tgt CCAAACC GGTGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.88 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in A/T and A/T flanks +ag_aac_tgg CCAACCC GGTGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.06 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in A/T and C/G flanks +ag_aag_tgc CCAAGCC GGTGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.53 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in A/T and G/C flanks +ag_aat_tga CCAATCC GGTGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.04 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in A/T and T/A flanks +ag_caa_ggt CCCAACC GGGGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.22 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in C/G and A/T flanks +ag_cac_ggg CCCACCC GGGGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.40 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in C/G and C/G flanks +ag_cag_ggc CCCAGCC GGGGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.87 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in C/G and G/C flanks +ag_cat_gga CCCATCC GGGGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.38 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in C/G and T/A flanks +ag_gaa_cgt CCGAACC GGCGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.79 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in G/C and A/T flanks +ag_gac_cgg CCGACCC GGCGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.97 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in G/C and C/G flanks +ag_gag_cgc CCGAGCC GGCGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.44 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in G/C and G/C flanks +ag_gat_cga CCGATCC GGCGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.95 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in G/C and T/A flanks +ag_taa_agt CCTAACC GGAGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.74 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in T/A and A/T flanks +ag_tac_agg CCTACCC GGAGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.92 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in T/A and C/G flanks +ag_tag_agc CCTAGCC GGAGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.39 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in T/A and G/C flanks +ag_tat_aga CCTATCC GGAGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.90 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge A/G in T/A and T/A flanks +ca_aca_tat CCACACC GGTATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.58 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in A/T and A/T flanks +ca_acc_tag CCACCCC GGTAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.86 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in A/T and C/G flanks +ca_acg_tac CCACGCC GGTACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.13 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in A/T and G/C flanks +ca_act_taa CCACTCC GGTAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.37 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in A/T and T/A flanks +ca_cca_gat CCCCACC GGGATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.00 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in C/G and A/T flanks +ca_ccc_gag CCCCCCC GGGAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.28 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in C/G and C/G flanks +ca_ccg_gac CCCCGCC GGGACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.55 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in C/G and G/C flanks +ca_cct_gaa CCCCTCC GGGAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.79 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in C/G and T/A flanks +ca_gca_cat CCGCACC GGCATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.08 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in G/C and A/T flanks +ca_gcc_cag CCGCCCC GGCAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.36 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in G/C and C/G flanks +ca_gcg_cac CCGCGCC GGCACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.63 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in G/C and G/C flanks +ca_gct_caa CCGCTCC GGCAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.87 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in G/C and T/A flanks +ca_tca_aat CCTCACC GGAATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.00 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in T/A and A/T flanks +ca_tcc_aag CCTCCCC GGAAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.28 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in T/A and C/G flanks +ca_tcg_aac CCTCGCC GGAACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.55 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in T/A and G/C flanks +ca_tct_aaa CCTCTCC GGAAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.79 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/A in T/A and T/A flanks +cc_aca_tct CCACACC GGTCTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.27 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in A/T and A/T flanks +cc_acc_tcg CCACCCC GGTCGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.40 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in A/T and C/G flanks +cc_acg_tcc CCACGCC GGTCCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.64 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in A/T and G/C flanks +cc_act_tca CCACTCC GGTCAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.38 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in A/T and T/A flanks +cc_cca_gct CCCCACC GGGCTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.04 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in C/G and A/T flanks +cc_ccc_gcg CCCCCCC GGGCGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.17 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in C/G and C/G flanks +cc_ccg_gcc CCCCGCC GGGCCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.41 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in C/G and G/C flanks +cc_cct_gca CCCCTCC GGGCAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.15 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in C/G and T/A flanks +cc_gca_cct CCGCACC GGCCTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.53 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in G/C and A/T flanks +cc_gcc_ccg CCGCCCC GGCCGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.66 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in G/C and C/G flanks +cc_gcg_ccc CCGCGCC GGCCCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.90 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in G/C and G/C flanks +cc_gct_cca CCGCTCC GGCCAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.64 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in G/C and T/A flanks +cc_tca_act CCTCACC GGACTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.85 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in T/A and A/T flanks +cc_tcc_acg CCTCCCC GGACGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.98 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in T/A and C/G flanks +cc_tcg_acc CCTCGCC GGACCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.22 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in T/A and G/C flanks +cc_tct_aca CCTCTCC GGACAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.96 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/C in T/A and T/A flanks +ct_aca_ttt CCACACC GGTTTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.28 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in A/T and A/T flanks +ct_acc_ttg CCACCCC GGTTGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.90 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in A/T and C/G flanks +ct_acg_ttc CCACGCC GGTTCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.65 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in A/T and G/C flanks +ct_act_tta CCACTCC GGTTAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.09 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in A/T and T/A flanks +ct_cca_gtt CCCCACC GGGTTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.66 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in C/G and A/T flanks +ct_ccc_gtg CCCCCCC GGGTGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.28 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in C/G and C/G flanks +ct_ccg_gtc CCCCGCC GGGTCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.03 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in C/G and G/C flanks +ct_cct_gta CCCCTCC GGGTAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.47 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in C/G and T/A flanks +ct_gca_ctt CCGCACC GGCTTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.06 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in G/C and A/T flanks +ct_gcc_ctg CCGCCCC GGCTGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.68 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in G/C and C/G flanks +ct_gcg_ctc CCGCGCC GGCTCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.43 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in G/C and G/C flanks +ct_gct_cta CCGCTCC GGCTAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.87 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in G/C and T/A flanks +ct_tca_att CCTCACC GGATTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.47 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in T/A and A/T flanks +ct_tcc_atg CCTCCCC GGATGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 5.09 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in T/A and C/G flanks +ct_tcg_atc CCTCGCC GGATCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.84 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in T/A and G/C flanks +ct_tct_ata CCTCTCC GGATAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.28 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge C/T in T/A and T/A flanks +ga_aga_tat CCAGACC GGTATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.02 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in A/T and A/T flanks +ga_agc_tag CCAGCCC GGTAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.29 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in A/T and C/G flanks +ga_agg_tac CCAGGCC GGTACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.17 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in A/T and G/C flanks +ga_agt_taa CCAGTCC GGTAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.88 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in A/T and T/A flanks +ga_cga_gat CCCGACC GGGATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.00 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in C/G and A/T flanks +ga_cgc_gag CCCGCCC GGGAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.27 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in C/G and C/G flanks +ga_cgg_gac CCCGGCC GGGACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.15 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in C/G and G/C flanks +ga_cgt_gaa CCCGTCC GGGAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.86 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in C/G and T/A flanks +ga_gga_cat CCGGACC GGCATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.04 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in G/C and A/T flanks +ga_ggc_cag CCGGCCC GGCAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.31 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in G/C and C/G flanks +ga_ggg_cac CCGGGCC GGCACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.19 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in G/C and G/C flanks +ga_ggt_caa CCGGTCC GGCAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.90 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in G/C and T/A flanks +ga_tga_aat CCTGACC GGAATGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.91 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in T/A and A/T flanks +ga_tgc_aag CCTGCCC GGAAGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.18 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in T/A and C/G flanks +ga_tgg_aac CCTGGCC GGAACGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.06 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in T/A and G/C flanks +ga_tgt_aaa CCTGTCC GGAAAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.77 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/A in T/A and T/A flanks +gg_aga_tgt CCAGACC GGTGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.89 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in A/T and A/T flanks +gg_agc_tgg CCAGCCC GGTGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.28 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in A/T and C/G flanks +gg_agg_tgc CCAGGCC GGTGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.88 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in A/T and G/C flanks +gg_agt_tga CCAGTCC GGTGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.46 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in A/T and T/A flanks +gg_cga_ggt CCCGACC GGGGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.80 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in C/G and A/T flanks +gg_cgc_ggg CCCGCCC GGGGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.19 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in C/G and C/G flanks +gg_cgg_ggc CCCGGCC GGGGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.79 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in C/G and G/C flanks +gg_cgt_gga CCCGTCC GGGGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.37 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in C/G and T/A flanks +gg_gga_cgt CCGGACC GGCGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.47 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in G/C and A/T flanks +gg_ggc_cgg CCGGCCC GGCGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.86 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in G/C and C/G flanks +gg_ggg_cgc CCGGGCC GGCGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.46 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in G/C and G/C flanks +gg_ggt_cga CCGGTCC GGCGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.04 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in G/C and T/A flanks +gg_tga_agt CCTGACC GGAGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.63 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in T/A and A/T flanks +gg_tgc_agg CCTGCCC GGAGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.02 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in T/A and C/G flanks +gg_tgg_agc CCTGGCC GGAGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.62 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in T/A and G/C flanks +gg_tgt_aga CCTGTCC GGAGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.20 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/G in T/A and T/A flanks +gt_aga_ttt CCAGACC GGTTTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.63 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in A/T and A/T flanks +gt_agc_ttg CCAGCCC GGTTGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.64 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in A/T and C/G flanks +gt_agg_ttc CCAGGCC GGTTCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.51 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in A/T and G/C flanks +gt_agt_tta CCAGTCC GGTTAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.50 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in A/T and T/A flanks +gt_cga_gtt CCCGACC GGGTTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.34 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in C/G and A/T flanks +gt_cgc_gtg CCCGCCC GGGTGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.35 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in C/G and C/G flanks +gt_cgg_gtc CCCGGCC GGGTCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.22 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in C/G and G/C flanks +gt_cgt_gta CCCGTCC GGGTAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.21 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in C/G and T/A flanks +gt_gga_ctt CCGGACC GGCTTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.56 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in G/C and A/T flanks +gt_ggc_ctg CCGGCCC GGCTGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.57 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in G/C and C/G flanks +gt_ggg_ctc CCGGGCC GGCTCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.44 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in G/C and G/C flanks +gt_ggt_cta CCGGTCC GGCTAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.43 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in G/C and T/A flanks +gt_tga_att CCTGACC GGATTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.52 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in T/A and A/T flanks +gt_tgc_atg CCTGCCC GGATGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.53 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in T/A and C/G flanks +gt_tgg_atc CCTGGCC GGATCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.40 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in T/A and G/C flanks +gt_tgt_ata CCTGTCC GGATAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.39 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge G/T in T/A and T/A flanks +tc_ata_tct CCATACC GGTCTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.16 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in A/T and A/T flanks +tc_atc_tcg CCATCCC GGTCGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.53 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in A/T and C/G flanks +tc_atg_tcc CCATGCC GGTCCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.68 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in A/T and G/C flanks +tc_att_tca CCATTCC GGTCAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.25 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in A/T and T/A flanks +tc_cta_gct CCCTACC GGGCTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.23 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in C/G and A/T flanks +tc_ctc_gcg CCCTCCC GGGCGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.60 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in C/G and C/G flanks +tc_ctg_gcc CCCTGCC GGGCCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.75 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in C/G and G/C flanks +tc_ctt_gca CCCTTCC GGGCAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.32 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in C/G and T/A flanks +tc_gta_cct CCGTACC GGCCTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.97 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in G/C and A/T flanks +tc_gtc_ccg CCGTCCC GGCCGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.34 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in G/C and C/G flanks +tc_gtg_ccc CCGTGCC GGCCCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.49 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in G/C and G/C flanks +tc_gtt_cca CCGTTCC GGCCAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 4.06 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in G/C and T/A flanks +tc_tta_act CCTTACC GGACTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.30 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in T/A and A/T flanks +tc_ttc_acg CCTTCCC GGACGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.67 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in T/A and C/G flanks +tc_ttg_acc CCTTGCC GGACCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.82 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in T/A and G/C flanks +tc_ttt_aca CCTTTCC GGACAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.39 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/C in T/A and T/A flanks +tg_ata_tgt CCATACC GGTGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.96 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in A/T and A/T flanks +tg_atc_tgg CCATCCC GGTGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.33 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in A/T and C/G flanks +tg_atg_tgc CCATGCC GGTGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.93 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in A/T and G/C flanks +tg_att_tga CCATTCC GGTGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.66 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in A/T and T/A flanks +tg_cta_ggt CCCTACC GGGGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.97 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in C/G and A/T flanks +tg_ctc_ggg CCCTCCC GGGGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.34 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in C/G and C/G flanks +tg_ctg_ggc CCCTGCC GGGGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.94 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in C/G and G/C flanks +tg_ctt_gga CCCTTCC GGGGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.67 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in C/G and T/A flanks +tg_gta_cgt CCGTACC GGCGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.86 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in G/C and A/T flanks +tg_gtc_cgg CCGTCCC GGCGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.23 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in G/C and C/G flanks +tg_gtg_cgc CCGTGCC GGCGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 1.83 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in G/C and G/C flanks +tg_gtt_cga CCGTTCC GGCGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.56 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in G/C and T/A flanks +tg_tta_agt CCTTACC GGAGTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.35 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in T/A and A/T flanks +tg_ttc_agg CCTTCCC GGAGGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.72 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in T/A and C/G flanks +tg_ttg_agc CCTTGCC GGAGCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.32 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in T/A and G/C flanks +tg_ttt_aga CCTTTCC GGAGAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.05 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/G in T/A and T/A flanks +tt_ata_ttt CCATACC GGTTTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.83 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in A/T and A/T flanks +tt_atc_ttg CCATCCC GGTTGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.32 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in A/T and C/G flanks +tt_atg_ttc CCATGCC GGTTCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.90 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in A/T and G/C flanks +tt_att_tta CCATTCC GGTTAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.26 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in A/T and T/A flanks +tt_cta_gtt CCCTACC GGGTTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.42 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in C/G and A/T flanks +tt_ctc_gtg CCCTCCC GGGTGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.91 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in C/G and C/G flanks +tt_ctg_gtc CCCTGCC GGGTCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.49 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in C/G and G/C flanks +tt_ctt_gta CCCTTCC GGGTAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.85 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in C/G and T/A flanks +tt_gta_ctt CCGTACC GGCTTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.15 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in G/C and A/T flanks +tt_gtc_ctg CCGTCCC GGCTGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.64 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in G/C and C/G flanks +tt_gtg_ctc CCGTGCC GGCTCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.22 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in G/C and G/C flanks +tt_gtt_cta CCGTTCC GGCTAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.58 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in G/C and T/A flanks +tt_tta_att CCTTACC GGATTGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 2.94 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in T/A and A/T flanks +tt_ttc_atg CCTTCCC GGATGGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.43 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in T/A and C/G flanks +tt_ttg_atc CCTTGCC GGATCGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.01 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in T/A and G/C flanks +tt_ttt_ata CCTTTCC GGATAGG 37 1.0 0.0 0.0 monovalent 1 1 0 santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1 3.37 1e-9 decrease SLH2004_T2_internal_mismatch_compiled_dimer_gauge T/T in T/A and T/A flanks diff --git a/core/thermo/testdata/perfect_duplex_goldens.golden b/core/thermo/testdata/perfect_duplex_goldens.golden new file mode 100644 index 0000000..2e2b1b1 --- /dev/null +++ b/core/thermo/testdata/perfect_duplex_goldens.golden @@ -0,0 +1,4 @@ +id seq target3to5 anneal_c na_m mg_m dntp_m primer_total_m salt_model tm_c margin_c dg_kcal tolerance +fwd_salmonella GGAAAGACATATCCCAATACAGCAA CCTTTCTGTATAGGGTTATGTCGTT 60 0.05 0 0 2.5e-7 monovalent 54.9327554351536 -5.0672445648464 2.934553710523 1e-9 +rev_salmonella GTTTACCCATATCTTTGACGCTCTTA CAAATGGGTATAGAAACTGCGAGAAT 60 0.05 0 0 2.5e-7 monovalent 54.9668678338264 -5.0331321661736 3.063288090675 1e-9 +balanced_acgt ACGTACGTACGTACGTACGT TGCATGCATGCATGCATGCA 60 0.05 0 0 2.5e-7 monovalent 56.0612579387647 -3.9387420612353 1.928625479719 1e-9 diff --git a/core/thermo/testdata/salt_goldens.golden b/core/thermo/testdata/salt_goldens.golden new file mode 100644 index 0000000..da5c1e6 --- /dev/null +++ b/core/thermo/testdata/salt_goldens.golden @@ -0,0 +1,5 @@ +id seq target3to5 anneal_c na_m mg_m dntp_m primer_total_m salt_model effective_na_m free_mg_m tm_c margin_c dg_kcal tolerance +mono50 GGAAAGACATATCCCAATACAGCAA CCTTTCTGTATAGGGTTATGTCGTT 60 0.05 0 0 2.5e-7 monovalent 0.05 0 54.9327554351536 -5.0672445648464 2.934553710523 1e-9 +lite_mg3 GGAAAGACATATCCCAATACAGCAA CCTTTCTGTATAGGGTTATGTCGTT 60 0.05 0.003 0 2.5e-7 owczarzy-lite 0.258134571852 0.003 63.3566376107916 3.3566376107916 -1.895240909893 1e-9 +owczarzy08_mg3 GGAAAGACATATCCCAATACAGCAA CCTTTCTGTATAGGGTTATGTCGTT 60 0.05 0.003 0 2.5e-7 owczarzy08 0.05 0.003 62.3632523878526 2.3632523878526 -1.338301692992 1e-9 +owczarzy08_mg3_dntp08 GGAAAGACATATCCCAATACAGCAA CCTTTCTGTATAGGGTTATGTCGTT 60 0.05 0.003 0.0008 2.5e-7 owczarzy08 0.05 0.002211877134482 61.8726505016164 1.8726505016164 -1.062028477103 1e-9 diff --git a/core/thermo/testdata/structure_goldens.golden b/core/thermo/testdata/structure_goldens.golden new file mode 100644 index 0000000..a063424 --- /dev/null +++ b/core/thermo/testdata/structure_goldens.golden @@ -0,0 +1,3 @@ +id kind mode seq_a seq_b tm_c dg_kcal stem_len loop_len bulge_count internal_loop_count tolerance +hairpin_v2 hairpin hairpin GCGCGTTTTCGCGC 56.2369388901 0.426391392832 5 4 0 0 1e-9 +cross_v2 cross-dimer cross AAAACGCGCGCGCGCG TTTTCGCGCGCGCGCG 59.9950301859 0.00167676854828 12 0 0 0 1e-9 diff --git a/core/thermoaddons/conditions.go b/core/thermoaddons/conditions.go index 984e55c..a4d8dab 100644 --- a/core/thermoaddons/conditions.go +++ b/core/thermoaddons/conditions.go @@ -2,56 +2,25 @@ package thermoaddons import ( - "fmt" - "math" + "ipcr-core/thermo" "os" "strings" ) -// Conditions is a lightweight holder for commonly tuned wet-lab knobs. -type Conditions struct { - AnnealC float64 // °C - NaM float64 // monovalent cations, mol/L - MgM float64 // magnesium, mol/L - PrimerTotalM float64 // total primer concentration, mol/L - SelfComplementary bool -} +// Conditions is kept as a compatibility alias. New thermodynamic code should +// use thermo.Conditions directly. +type Conditions = thermo.Conditions -// ParseConc parses "50mM", "250nM", "3uM" → mol/L. -func ParseConc(s string) (float64, error) { - s = strings.TrimSpace(strings.ToLower(s)) - unit := "" - val := 0.0 - _, err := fmt.Sscanf(s, "%f%s", &val, &unit) - if err != nil { - return 0, fmt.Errorf("invalid conc %q: %w", s, err) - } - switch unit { - case "m", "": - return val, nil - case "mm": - return val * 1e-3, nil - case "um", "μm": - return val * 1e-6, nil - case "nm": - return val * 1e-9, nil - default: - return 0, fmt.Errorf("unknown unit %q in %q", unit, s) - } -} +// ParseConc delegates to the canonical concentration parser in core/thermo. +func ParseConc(s string) (float64, error) { return thermo.ParseConc(s) } -// EffectiveMonovalent returns a single “Na+-equivalent” to feed into salt -// corrections. By default, we *do not* add Mg2+ (keeps current behavior), -// but you can enable an Owczarzy-lite transform via env: -// -// IPCR_MG_EQ=owczarzy-lite → Na_eff = Na + 3.8*sqrt(Mg) -// -// This keeps us conservative and avoids silently changing users’ results. -// You can swap the transform later without touching thermodynamic tables. +// EffectiveMonovalent is a legacy compatibility wrapper. New code should choose +// an explicit thermo.SaltModel and call thermo.EffectiveMonovalent instead. func EffectiveMonovalent(naM, mgM float64) float64 { + model := thermo.SaltModelMonovalent mode := strings.TrimSpace(strings.ToLower(os.Getenv("IPCR_MG_EQ"))) - if mode == "owczarzy-lite" && mgM > 0 { - return naM + 3.8*math.Sqrt(mgM) + if mode == thermo.SaltModelOwczarzyLite.String() { + model = thermo.SaltModelOwczarzyLite } - return naM + return thermo.EffectiveMonovalent(naM, mgM, 0, model) } diff --git a/core/thermoaddons/conditions_test.go b/core/thermoaddons/conditions_test.go new file mode 100644 index 0000000..4c6b2cf --- /dev/null +++ b/core/thermoaddons/conditions_test.go @@ -0,0 +1,19 @@ +package thermoaddons + +import ( + "math" + "testing" +) + +func TestParseConc_AcceptsMicroVariants(t *testing.T) { + cases := []string{"3uM", "3µM", "3μM"} + for _, tc := range cases { + got, err := ParseConc(tc) + if err != nil { + t.Fatalf("ParseConc(%q): %v", tc, err) + } + if math.Abs(got-3e-6) > 1e-15 { + t.Fatalf("ParseConc(%q)=%g, want %g", tc, got, 3e-6) + } + } +} diff --git a/core/thermoaddons/hairpin_dimer.go b/core/thermoaddons/hairpin_dimer.go index 8e3274a..5deb69e 100644 --- a/core/thermoaddons/hairpin_dimer.go +++ b/core/thermoaddons/hairpin_dimer.go @@ -1,51 +1,42 @@ // core/thermoaddons/hairpin_dimer.go package thermoaddons -// HairpinPenalty returns a small, bounded °C penalty for simple hairpins in a -// 5'→3' single-stranded DNA segment. It’s intentionally conservative and O(n^2): -// - stem >= 4, loop >= 3 -// - penalty ~ 0.25 °C per stem base beyond 3, capped at 2.0 °C -func HairpinPenalty(seq5to3 string) float64 { - b := []byte(seq5to3) - n := len(b) - bestStem := 0 +import ( + "ipcr-core/thermo" + "math" +) - comp := func(x byte) byte { - switch x { - case 'A', 'a': - return 'T' - case 'T', 't', 'U', 'u': - return 'A' - case 'C', 'c': - return 'G' - case 'G', 'g': - return 'C' - default: - return 0 - } - } +// HairpinPenalty returns a bounded °C-equivalent penalty for the strongest +// nearest-neighbor hairpin stem found in a 5'→3' single-stranded DNA segment. +// It is retained as a compatibility wrapper; new callers should use +// thermo.BestHairpin when they need ΔG/Tm components. +func HairpinPenalty(seq5to3 string) float64 { + return HairpinPenaltyWithConditions(seq5to3, thermo.DefaultConditions()) +} - for i := 0; i < n; i++ { - for j := i + 7; j < n; j++ { // loop >=3 → j-(i+1) >= 3 → j >= i+4; plus 3 more for stem room - // grow stem outwards from (i) and (j) - li, rj := i, j - stem := 0 - for li >= 0 && rj < n && comp(b[li]) == b[rj] { - stem++ - li-- - rj++ - } - if stem > bestStem { - bestStem = stem - } - } +func HairpinPenaltyWithConditions(seq5to3 string, cond thermo.Conditions) float64 { + res, ok, err := thermo.BestHairpin(seq5to3, thermo.DefaultStructureOptions(cond)) + if err != nil || !ok || res.DeltaGAtAnnealKcal >= 0 { + return 0 } - if bestStem < 4 { + pen := -1.5 * res.DeltaGAtAnnealKcal + if math.IsNaN(pen) || math.IsInf(pen, 0) || pen < 0 { return 0 } - pen := 0.25 * float64(bestStem-3) - if pen > 2.0 { - pen = 2.0 + if pen > 6.0 { + return 6.0 } return pen } + +func BestHairpin(seq5to3 string, cond thermo.Conditions) (thermo.StructureResult, bool, error) { + return thermo.BestHairpin(seq5to3, thermo.DefaultStructureOptions(cond)) +} + +func BestSelfDimer(seq5to3 string, cond thermo.Conditions) (thermo.StructureResult, bool, error) { + return thermo.BestSelfDimer(seq5to3, thermo.DefaultStructureOptions(cond)) +} + +func BestCrossDimer(a5to3, b5to3 string, cond thermo.Conditions) (thermo.StructureResult, bool, error) { + return thermo.BestCrossDimer(a5to3, b5to3, thermo.DefaultStructureOptions(cond)) +} diff --git a/core/thermoaddons/nnparams.go b/core/thermoaddons/nnparams.go deleted file mode 100644 index 32c015c..0000000 --- a/core/thermoaddons/nnparams.go +++ /dev/null @@ -1,23 +0,0 @@ -package thermoaddons - -const Rcal = 1.987 - -var nnDH = map[string]float64{ - "AA": -7.9, "TT": -7.9, "AT": -7.2, "TA": -7.2, - "CA": -8.5, "TG": -8.5, "GT": -8.4, "AC": -8.4, - "CT": -7.8, "AG": -7.8, "GA": -8.2, "TC": -8.2, - "CG": -10.6, "GC": -9.8, "GG": -8.0, "CC": -8.0, -} - -var nnDS = map[string]float64{ - "AA": -22.2, "TT": -22.2, "AT": -20.4, "TA": -21.3, - "CA": -22.7, "TG": -22.7, "GT": -22.4, "AC": -22.4, - "CT": -21.0, "AG": -21.0, "GA": -22.2, "TC": -22.2, - "CG": -27.2, "GC": -24.4, "GG": -19.9, "CC": -19.9, -} - -const ( - initDH = 0.2 - initDS = -5.7 -) -const symmetryDS = -1.4 diff --git a/core/thermoaddons/tm.go b/core/thermoaddons/tm.go index 5bc187c..68be800 100644 --- a/core/thermoaddons/tm.go +++ b/core/thermoaddons/tm.go @@ -2,7 +2,7 @@ package thermoaddons import ( "errors" - "math" + "ipcr-core/thermo" "strings" ) @@ -11,37 +11,69 @@ func TmNearestNeighbor(primer5to3 string, cond Conditions) (tmC, dH, dS float64, if len(s) < 2 { return 0, 0, 0, errors.New("sequence too short") } - dH = initDH - dS = initDS - for i := 0; i < len(s)-1; i++ { - dh, okH := nnDH[s[i:i+2]] - ds, okS := nnDS[s[i:i+2]] - if !okH || !okS { - return 0, 0, 0, errors.New("invalid base (need A/C/G/T)") - } - dH += dh - dS += ds + target3to5, ok := complement3to5(s) + if !ok { + return 0, 0, 0, errors.New("invalid base (need A/C/G/T)") } - if cond.SelfComplementary { - dS += symmetryDS + + cond = cond.WithDefaults() + if cond.SaltModel == "" { + cond.SaltModel = thermo.SaltModelMonovalent } - naEff := EffectiveMonovalent(cond.NaM, cond.MgM) - if naEff <= 0 { - naEff = 1e-6 + if isSelfComplementary(s) { + cond.SelfComplementary = true } - dS += 0.368 * float64(len(s)-1) * math.Log(naEff) - - ct := math.Max(cond.PrimerTotalM, 1e-12) - cfactor := 4.0 - if cond.SelfComplementary { - cfactor = 1.0 + res, err := thermo.Tm(s, target3to5, cond.TmInput()) + if err != nil { + return 0, 0, 0, err } - den := dS + Rcal*math.Log(ct/cfactor) - tmK := (dH*1000.0)/den + 273.15 - return tmK - 273.15, dH, dS, nil + return res.TmC, res.DH_kcal, res.DS_Na, nil } func DeltaGAt(dHkcal, dScal, tempC float64) float64 { tK := tempC + 273.15 return dHkcal - (tK * dScal / 1000.0) } + +func complement3to5(s string) (string, bool) { + out := make([]byte, len(s)) + for i := 0; i < len(s); i++ { + switch s[i] { + case 'A': + out[i] = 'T' + case 'C': + out[i] = 'G' + case 'G': + out[i] = 'C' + case 'T': + out[i] = 'A' + default: + return "", false + } + } + return string(out), true +} + +func reverseComplement(s string) (string, bool) { + out := make([]byte, len(s)) + for i := 0; i < len(s); i++ { + switch s[i] { + case 'A': + out[len(s)-1-i] = 'T' + case 'C': + out[len(s)-1-i] = 'G' + case 'G': + out[len(s)-1-i] = 'C' + case 'T': + out[len(s)-1-i] = 'A' + default: + return "", false + } + } + return string(out), true +} + +func isSelfComplementary(s string) bool { + rc, ok := reverseComplement(s) + return ok && rc == s +} diff --git a/core/thermoaddons/tm_test.go b/core/thermoaddons/tm_test.go new file mode 100644 index 0000000..1fcb71f --- /dev/null +++ b/core/thermoaddons/tm_test.go @@ -0,0 +1,27 @@ +package thermoaddons + +import ( + "ipcr-core/thermo" + "math" + "testing" +) + +func TestTmNearestNeighborDelegatesToCanonicalThermo(t *testing.T) { + primer := "GCGCGATATCGC" + cond := Conditions{NaM: 0.05, PrimerTotalM: 2.5e-7, SaltModel: thermo.SaltModelMonovalent} + gotTm, gotDH, gotDS, err := TmNearestNeighbor(primer, cond) + if err != nil { + t.Fatalf("TmNearestNeighbor: %v", err) + } + target, _ := complement3to5(primer) + want, err := thermo.Tm(primer, target, cond.WithDefaults().TmInput()) + if err != nil { + t.Fatalf("thermo.Tm: %v", err) + } + if math.Abs(gotTm-want.TmC) > 1e-9 || math.Abs(gotDH-want.DH_kcal) > 1e-9 || math.Abs(gotDS-want.DS_Na) > 1e-9 { + t.Fatalf("got Tm/DH/DS %.12g %.12g %.12g, want %.12g %.12g %.12g", gotTm, gotDH, gotDS, want.TmC, want.DH_kcal, want.DS_Na) + } + if gotTm > 120 || gotTm < -20 { + t.Fatalf("TmNearestNeighbor returned implausible Celsius value: %g", gotTm) + } +} diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 4c267f3..35939d1 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -31,6 +31,24 @@ - **Only writers know about “pretty”.** - **Engine never depends upward.** (no imports of app, pipeline, writers, cli, output) +## Thermodynamic modeling boundary + +`ipcr-thermo` is implemented as a ranking layer over the core amplicon engine. +The engine still finds candidate products; the thermo visitor annotates and +ranks those products using model/profile metadata. + +Keep these boundaries explicit: + +- Thermodynamic modes such as `nn-duplex-v1` and `nn-structure-v1` describe how + primer/probe/structure terms are calculated. +- Score profiles such as `binding`, `pcr`, and `gel` describe how those terms are + combined for ranking. +- Fallback policies, IUPAC expansion status, salt model, and probe score mode are + output metadata, not presentation-only details. +- Empirical profiles must not be documented as full PCR kinetics. + +Detailed release-claim guidance is in [`docs/THERMO_MODELS.md`](./THERMO_MODELS.md). Release/smoke-test guidance is in [`docs/THERMO_RELEASE_CHECKLIST.md`](./THERMO_RELEASE_CHECKLIST.md). + ## Future split If you ever need external reuse: lift `engine`, `primer`, `probe`, `oligo`, `fasta` into a separate module (`ipcr-core`) and keep `app`, `appcore`, `writers`, `output`, `pretty` in this repo. diff --git a/docs/THERMO_MODELS.md b/docs/THERMO_MODELS.md new file mode 100644 index 0000000..538d99d --- /dev/null +++ b/docs/THERMO_MODELS.md @@ -0,0 +1,183 @@ +# Thermodynamic models, score profiles, and release claims + +`ipcr-thermo` is a thermodynamically informed ranking tool. It is not a full +PCR kinetics simulator. The implementation combines nearest-neighbor duplex +terms, explicit fallback labels, secondary-structure competition terms, and +empirical PCR/gel score profiles so users can rank candidate amplicons and see +why a product was favored or filtered. + +The practical release claim is: + +> `ipcr-thermo` provides nearest-neighbor-informed in-silico PCR ranking with +> explicit approximation metadata and deterministic outputs. + +Do not describe the current implementation as fully thermodynamically faithful +PCR amplification modeling. PCR yield, gel brightness, polymerase kinetics, +modified probes, terminal/tandem mismatch chemistry, and complete loop tables +still require calibration or additional chemistry-specific parameters. + +## Thermodynamic implementation modes + +| Mode | Purpose | Notes | +| ------------------ | --------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `legacy-heuristic` | Historical score path | Maintained for backward compatibility. Scores are not directly comparable with NN modes. | +| `nn-duplex-v1` | Primer-template nearest-neighbor duplex ranking | Uses runtime conditions, salt model, primer concentration, IUPAC thermo policy, explicit mismatch source/parameter-set metadata, and SantaLucia-Hicks 2004 terminal dangling-end terms when template flanks are available. | +| `nn-structure-v1` | `nn-duplex-v1` plus primer hairpin/self-dimer/cross-dimer competition | Uses the current secondary-structure evaluator and reports structure policy/model metadata. | + +## Structure model labels + +| Label | Meaning | +| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | +| `nn-contiguous-stem-v1` | Contiguous Watson-Crick stem model. Preserved as the simplest secondary-structure baseline. | +| `nn-stem-loop-v2` | Bounded gapped-stem model with bulges, small internal loops, asymmetric-loop penalties, and structure dangling-end approximations. | + +`nn-stem-loop-v2` is a bounded approximation. It does not replace a full dynamic +programming secondary-structure engine with complete loop, bulge, dangling-end, +and coaxial-stacking parameter tables. + +## Score profiles + +| Profile | Intended question | Formula sketch | +| --------- | ---------------------------------------------------------------------- | --------------------------------------------------------- | +| `binding` | Which primer pair binds best under the configured thermodynamic model? | Primer-template score, minus enabled structure penalties. | +| `pcr` | Which product is expected to amplify efficiently? | `binding + extension_bonus - length_penalty`. | +| `gel` | Which product is expected to look strongest on an agarose gel? | `pcr + band_mass_bonus`. | + +`pcr` and `gel` are empirical ranking profiles. They are useful for reproducing +observed product dominance, but they are not calibrated polymerase kinetics or a +quantitative fluorescence/gel-intensity model. + +### Choosing a profile + +| Use case | Recommended profile | Rationale | +| --------------------------------------------- | -------------------------- | ------------------------------------------------------------------------------- | +| Primer-design triage | `binding` | Least empirical; closest to primer-template/structure thermodynamics. | +| Multiplex product prioritization | `pcr` | Adds extension and long-product penalties without treating band mass as signal. | +| Comparing against agarose-gel band prominence | `gel` | Adds a band-mass proxy so short products are not automatically favored. | +| Debugging model changes | `binding --thermo-details` | Keeps the score closest to the underlying terms and exposes metadata. | + +Scores from different profiles should be treated as different ranking scales. A +`gel` score is not directly comparable with a `binding` score even when the same +amplicon and conditions are used. + +## Salt and concentration models + +| Salt model | Meaning | +| --------------- | ------------------------------------------------------------------------------ | +| `monovalent` | Monovalent nearest-neighbor salt correction. | +| `owczarzy-lite` | Mg-to-effective-Na approximation for compatibility and continuity. | +| `owczarzy08` | Mixed monovalent/divalent correction with dNTP-adjusted free Mg approximation. | + +When reporting results, keep the raw and effective ionic conditions visible: +`na_m`, `mg_m`, `dntp_m`, `effective_na_m`, and `free_mg_m`. + +## IUPAC thermodynamics policy + +Thermodynamic scoring supports explicit IUPAC expansion policies: + +| Policy | Behavior | +| ----------- | ----------------------------------------------------------------------------------------------- | +| `strict` | Reject non-ACGT primers/probes in NN thermodynamics. | +| `worst` | Expand concrete variants and use the weakest score. Recommended for conservative assay ranking. | +| `best` | Use the strongest concrete variant. | +| `mean` | Average concrete variants. | +| `enumerate` | Emit per-expansion diagnostics where supported. | + +Always report the IUPAC metadata when present: +`iupac_thermo_policy`, `iupac_expansion_count`, `iupac_expansion_capped`, and +`iupac_effective_variant`. + +## Probe thermodynamics + +Probe thermodynamics reuses the primer-template NN machinery for unmodified DNA +probe/site duplexes. The current modes are: + +| Mode | Behavior | +| ---------- | --------------------------------------------------------------------------- | +| `annotate` | Compute and report probe thermodynamics without changing product score. | +| `gate` | Penalize or suppress products that fail probe presence/margin requirements. | +| `blend` | Blend probe margin into the product score using `--probe-weight`. | + +Modified probes are not fully modeled. In particular, MGB, LNA, molecular beacon, +quencher, and dye effects are not automatically calibrated. For MGB assays, use +`--probe-score-mode annotate` or `--probe-thermo=false` unless a calibrated probe +modifier model is added. + +## Fallback metadata that should remain visible + +Thermodynamic outputs should expose when approximate paths were used. The most +important labels are: + +- mismatch policy, source labels, parameter sets, citations, and fallback counts, +- terminal mismatch and template dangling-end fields, +- structure policy/model, +- salt model and free/effective ion concentrations, +- IUPAC policy and expansion/capping status, +- score profile, +- probe score mode and gate penalty. + +These fields are part of the release story: a result can be useful even when it +uses approximations, as long as the approximation is visible. + +## Output comparability rules + +- Compare scores only within the same `thermo_model`, `score_profile`, salt model, + IUPAC policy, probe score mode, and annealing conditions. +- Treat `legacy-heuristic` scores as historical compatibility scores, not as the + same unit scale as NN modes. +- When reporting a ranked panel, include the model labels and conditions used to + generate the ranking. +- Prefer JSON/JSONL or `--thermo-details` TSV for release examples because scalar + scores alone hide fallback and approximation metadata. + +## Known remaining limitations + +The current thermodynamic implementation is intentionally transparent about these +limits: + +1. Curated mismatch triplets are complete for isolated internal single-base + A/C/G/T DNA/DNA mismatches in Watson-Crick flanking contexts. Terminal + target/template dangling ends next to Watson-Crick closing pairs use the + SantaLucia-Hicks 2004 Table 3 DNA/DNA parameter set when flanking bases are + available. +2. Fallback terms remain possible for terminal mismatches, tandem/clustered + mismatches, target `N`, degenerate/IUPAC-expanded edge cases, and + modified-probe chemistry. +3. `owczarzy08` uses a practical mixed-salt/free-Mg approximation, not a full + activity-coefficient chemistry model. +4. `nn-stem-loop-v2` is not a complete secondary-structure dynamic-programming + engine. +5. PCR and gel score profiles are empirical rankers, not full amplification + kinetics. +6. Modified probe chemistries such as MGB require opt-in calibration. +7. Scores from different thermo modes or score profiles should not be compared + as if they were on one universal physical scale. + +## Release checklist + +The operational checklist lives in [THERMO_RELEASE_CHECKLIST.md](./THERMO_RELEASE_CHECKLIST.md). + +Before advertising a release as thermodynamically informed, verify: + +```bash +go test ./... -count=1 +(cd core && go test ./... -count=1) +go test -tags thermo ./... -count=1 +golangci-lint run +make build +``` + +Also check representative CLI output with: + +```bash +bin/ipcr-thermo --examples +bin/ipcr-thermo --help +``` + +and at least one `--thermo-details` run for each profile: + +```bash +--score-profile binding +--score-profile pcr +--score-profile gel +``` diff --git a/docs/THERMO_RELEASE_CHECKLIST.md b/docs/THERMO_RELEASE_CHECKLIST.md new file mode 100644 index 0000000..1e08070 --- /dev/null +++ b/docs/THERMO_RELEASE_CHECKLIST.md @@ -0,0 +1,98 @@ +# Thermodynamic release checklist + +Use this checklist before releasing or advertising changes to `ipcr-thermo`. It +keeps the public claim aligned with the actual model: thermodynamically informed +ranking with explicit approximation metadata. + +## Required automated checks + +Run these from the repository root: + +```bash +go test ./... -count=1 +(cd core && go test ./... -count=1) +go test -tags thermo ./... -count=1 +golangci-lint run +make build +preflight +``` + +If `golangci-lint` is unavailable in a local environment, do not mark the release +ready until CI or another machine has run it. + +## Required smoke tests + +Run at least one representative command for each row. Use real FASTA fixtures +when available and keep the command/output in release notes or test artifacts. + +| Scenario | Example options | Expected check | +| --------------------- | ------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | +| Legacy compatibility | default `ipcr-thermo` invocation | Existing historical rows still appear. | +| NN duplex | `--thermo-model nn-duplex-v1 --thermo-details` | Output includes NN model, salt model, margins, mismatch metadata, and dangling-end fields when template flanks are available. | +| NN structure | `--thermo-model nn-structure-v1 --thermo-details` | Output includes structure policy/model and dimer/hairpin fields. | +| Score profiles | `--score-profile binding`, `pcr`, `gel` | Product order changes only for documented profile reasons. | +| IUPAC thermo | `--iupac-thermo-policy worst` with a degenerate primer | Output includes expansion count, capped status, effective variant. | +| Mixed salt/dNTP | `--salt-model owczarzy08 --mg 3mM --dntp 0.8mM` | Output includes raw/effective ion fields. | +| Probe annotate | `--probe ... --probe-thermo --probe-score-mode annotate` | Probe fields populate without filtering the product. | +| Probe gate | `--probe ... --probe-thermo --probe-score-mode gate` | Failing probes are filtered/penalized in a documented way. | +| Modified probe caveat | MGB/LNA style assay with `annotate` or `--probe-thermo=false` | Documentation says unmodified-DNA thermo is not calibrated for modified probes. | + +## Output metadata checklist + +A release example that uses thermodynamics should expose enough metadata to audit +the score. Prefer JSON/JSONL or `--thermo-details` TSV. Check for these fields +when the corresponding layer is enabled: + +- `thermo_model` / model label +- `score_profile` +- salt model and ionic concentrations (`na_m`, `mg_m`, `dntp_m`, free/effective ions) +- IUPAC policy, expansion count, cap status, and effective variant +- mismatch policy, fallback counts, source labels, parameter sets/citations, and mismatch penalties +- terminal mismatch and table-backed dangling-end fields when present +- structure policy/model and component penalties +- probe score mode, probe margin, gate penalty, and probe IUPAC metadata + +## Release wording checklist + +Acceptable: + +- “thermodynamically informed ranking” +- “nearest-neighbor based primer/probe scoring” +- “explicit fallback and approximation metadata” +- “empirical PCR/gel score profiles” + +Avoid unless a separately validated kinetics model is added: + +- “fully thermodynamically faithful PCR simulation” +- “absolute amplification yield prediction” +- “quantitative gel-intensity prediction” +- “MGB/LNA probe thermodynamics” without a named calibrated modifier model + +## Model-change checklist + +When adding or changing a model term: + +1. Add a stable model/policy label. +2. Expose the term in JSON/JSONL and `--thermo-details` where practical. +3. Add or update a unit test for monotonic behavior and a regression test for a + representative fixture. +4. Document whether the term is literature-parameterized, heuristic, or empirical. +5. Confirm scores from old and new profiles are not described as interchangeable. + +## Known release caveats + +Document these until they are replaced with calibrated/literature-backed models: + +- Curated mismatch triplets cover isolated internal single-base A/C/G/T DNA/DNA + mismatches. +- Template-adjacent terminal dangling ends next to Watson-Crick closing pairs use + SantaLucia-Hicks 2004 Table 3 when flanking bases are available. +- Fallback mismatch terms remain part of the model for terminal, tandem/clustered, + target-`N`, degenerate-edge, and modified-probe contexts. +- `nn-stem-loop-v2` is a bounded structure approximation, not a full partition + function or dynamic-programming thermodynamic structure engine. +- `owczarzy08` and related salt handling are practical approximations that should + be checked against goldens. +- `pcr` and `gel` are empirical ranking profiles. +- Modified probes such as MGB probes require explicit calibration before gate or + blend mode should be trusted. diff --git a/internal/output/json.go b/internal/output/json.go index 1a4e7f1..c78117b 100644 --- a/internal/output/json.go +++ b/internal/output/json.go @@ -25,11 +25,202 @@ func ToAPIProduct(p engine.Product) api.ProductV1 { Seq: p.Seq, SourceFile: p.SourceFile, } + if p.Thermo != nil { + v.Thermo = &api.ThermoDetailsV1{ + Model: p.Thermo.Model, + SaltModel: p.Thermo.SaltModel, + NaM: p.Thermo.NaM, + MgM: p.Thermo.MgM, + DntpM: p.Thermo.DntpM, + EffectiveNaM: p.Thermo.EffectiveNaM, + FreeMgM: p.Thermo.FreeMgM, + AnnealTempC: p.Thermo.AnnealTempC, + IUPACPolicy: p.Thermo.IUPACPolicy, + IUPACThermoPolicy: p.Thermo.IUPACThermoPolicy, + IUPACExpansionCount: p.Thermo.IUPACExpansionCount, + IUPACExpansionCapped: p.Thermo.IUPACExpansionCapped, + IUPACEffectiveVariant: p.Thermo.IUPACEffectiveVariant, + IUPACVariants: toAPIIUPACVariants(p.Thermo.IUPACVariants), + MismatchPolicy: p.Thermo.MismatchPolicy, + StructurePolicy: p.Thermo.StructurePolicy, + ScoreProfile: p.Thermo.ScoreProfile, + ScoreC: p.Thermo.ScoreC, + BaseScoreC: p.Thermo.BaseScoreC, + AmpliconAdjustmentC: p.Thermo.AmpliconAdjustmentC, + ExtensionLogit: p.Thermo.ExtensionLogit, + ExtensionBonusC: p.Thermo.ExtensionBonusC, + LengthPenaltyC: p.Thermo.LengthPenaltyC, + BandMassBonusC: p.Thermo.BandMassBonusC, + StructurePenaltyC: p.Thermo.StructurePenaltyC, + LimitingSide: p.Thermo.LimitingSide, + Fwd: toAPIThermoEndpoint(p.Thermo.Fwd), + Rev: toAPIThermoEndpoint(p.Thermo.Rev), + Probe: toAPIProbeThermo(p.Thermo.Probe), + WorstHairpin: toAPIThermoStructure(p.Thermo.WorstHairpin), + WorstSelfDimer: toAPIThermoStructure(p.Thermo.WorstSelfDimer), + CrossDimer: toAPIThermoStructure(p.Thermo.CrossDimer), + PanelCrossDimer: toAPIThermoStructure(p.Thermo.PanelCrossDimer), + PanelCrossDimerPenaltyC: p.Thermo.PanelCrossDimerPenaltyC, + PanelCrossDimerBurdenC: p.Thermo.PanelCrossDimerBurdenC, + PanelCrossDimerCount: p.Thermo.PanelCrossDimerCount, + } + } // Conditionally attach Score (thermo-only). applyScoreToAPI(&v, p) return v } +func toAPIProbeThermo(src *engine.ProbeThermoDetails) *api.ProbeThermoV1 { + if src == nil { + return nil + } + return &api.ProbeThermoV1{ + Name: src.Name, + Seq: src.Seq, + Found: src.Found, + Strand: src.Strand, + Pos: src.Pos, + MM: src.MM, + Site: src.Site, + ScoreMode: src.ScoreMode, + MinMarginC: src.MinMarginC, + ScoreContributionC: src.ScoreContributionC, + GatePenaltyC: src.GatePenaltyC, + IUPACThermoPolicy: src.IUPACThermoPolicy, + IUPACExpansionCount: src.IUPACExpansionCount, + IUPACExpansionCapped: src.IUPACExpansionCapped, + IUPACEffectiveVariant: src.IUPACEffectiveVariant, + TmC: src.TmC, + AnnealMarginC: src.AnnealMarginC, + DeltaGAtAnnealKcal: src.DeltaGAtAnnealKcal, + MismatchPenaltyC: src.MismatchPenaltyC, + MismatchDeltaGKcal: src.MismatchDeltaGKcal, + MismatchCount: src.MismatchCount, + MismatchFallbackCount: src.MismatchFallbackCount, + MismatchTripletCount: src.MismatchTripletCount, + MismatchCuratedPairCount: src.MismatchCuratedPairCount, + MismatchSources: append([]string(nil), src.MismatchSources...), + MismatchParameterSets: append([]string(nil), src.MismatchParameterSets...), + MismatchCitations: append([]string(nil), src.MismatchCitations...), + MismatchParameterNotes: append([]string(nil), src.MismatchParameterNotes...), + TerminalMismatchPenaltyC: src.TerminalMismatchPenaltyC, + TerminalMismatchDeltaGKcal: src.TerminalMismatchDeltaGKcal, + TerminalMismatchCount: src.TerminalMismatchCount, + FivePrimeTerminalMismatchCount: src.FivePrimeTerminalMismatchCount, + ThreePrimeTerminalMismatchCount: src.ThreePrimeTerminalMismatchCount, + TerminalMismatchSources: append([]string(nil), src.TerminalMismatchSources...), + TerminalMismatchParameterSets: append([]string(nil), src.TerminalMismatchParameterSets...), + TerminalMismatchCitations: append([]string(nil), src.TerminalMismatchCitations...), + TerminalMismatchParameterNotes: append([]string(nil), src.TerminalMismatchParameterNotes...), + MismatchPolicy: src.MismatchPolicy, + HasNonWatsonCrick: src.HasNonWatsonCrick, + UsedHeuristicAdjust: src.UsedHeuristicAdjust, + } +} + +func toAPIIUPACVariants(src []engine.ThermoVariant) []api.ThermoIUPACVariantV1 { + if len(src) == 0 { + return nil + } + out := make([]api.ThermoIUPACVariantV1, len(src)) + for i, v := range src { + out[i] = api.ThermoIUPACVariantV1{ + FwdVariant: v.FwdPrimer, + RevVariant: v.RevPrimer, + ScoreC: v.ScoreC, + BaseScoreC: v.BaseScoreC, + StructurePenaltyC: v.StructurePenaltyC, + LimitingSide: v.LimitingSide, + FwdTmC: v.FwdTmC, + RevTmC: v.RevTmC, + FwdMarginC: v.FwdMarginC, + RevMarginC: v.RevMarginC, + } + } + return out +} + +func toAPIThermoEndpoint(src engine.ThermoEndpoint) api.ThermoEndpointV1 { + return api.ThermoEndpointV1{ + Side: src.Side, + TmC: src.TmC, + AnnealMarginC: src.AnnealMarginC, + DeltaGAtAnnealKcal: src.DeltaGAtAnnealKcal, + MismatchPenaltyC: src.MismatchPenaltyC, + MismatchDeltaGKcal: src.MismatchDeltaGKcal, + TerminalMismatchPenaltyC: src.TerminalMismatchPenaltyC, + TerminalMismatchDeltaGKcal: src.TerminalMismatchDeltaGKcal, + DanglingEndAdjustmentC: src.DanglingEndAdjustmentC, + DanglingEndDeltaGKcal: src.DanglingEndDeltaGKcal, + DanglingEndCount: src.DanglingEndCount, + MismatchCount: src.MismatchCount, + FivePrimeMismatchCount: src.FivePrimeMismatchCount, + ThreePrimeMismatchCount: src.ThreePrimeMismatchCount, + FivePrimeTerminalMismatchCount: src.FivePrimeTerminalMismatchCount, + ThreePrimeTerminalMismatchCount: src.ThreePrimeTerminalMismatchCount, + TerminalMismatchCount: src.TerminalMismatchCount, + FivePrimeTerminalMismatchPenaltyC: src.FivePrimeTerminalMismatchPenaltyC, + ThreePrimeTerminalMismatchPenaltyC: src.ThreePrimeTerminalMismatchPenaltyC, + MismatchFallbackCount: src.MismatchFallbackCount, + MismatchTripletCount: src.MismatchTripletCount, + MismatchCuratedPairCount: src.MismatchCuratedPairCount, + MismatchSources: append([]string(nil), src.MismatchSources...), + MismatchParameterSets: append([]string(nil), src.MismatchParameterSets...), + MismatchCitations: append([]string(nil), src.MismatchCitations...), + MismatchParameterNotes: append([]string(nil), src.MismatchParameterNotes...), + TerminalMismatchSources: append([]string(nil), src.TerminalMismatchSources...), + TerminalMismatchParameterSets: append([]string(nil), src.TerminalMismatchParameterSets...), + TerminalMismatchCitations: append([]string(nil), src.TerminalMismatchCitations...), + TerminalMismatchParameterNotes: append([]string(nil), src.TerminalMismatchParameterNotes...), + EffectiveDenomCalK: src.EffectiveDenomCalK, + MismatchPolicy: src.MismatchPolicy, + EndEffectPolicy: src.EndEffectPolicy, + HasNonWatsonCrick: src.HasNonWatsonCrick, + UsedHeuristicAdjust: src.UsedHeuristicAdjust, + } +} + +func toAPIThermoStructure(src *engine.ThermoStructure) *api.ThermoStructureV1 { + if src == nil { + return nil + } + return &api.ThermoStructureV1{ + Kind: src.Kind, + Model: src.Model, + QueryA: src.QueryA, + QueryB: src.QueryB, + DeltaGAtAnnealKcal: src.DeltaGAtAnnealKcal, + TmC: src.TmC, + AnnealMarginC: src.AnnealMarginC, + StemLen: src.StemLen, + LoopLen: src.LoopLen, + AStart: src.AStart, + AEnd: src.AEnd, + BStart: src.BStart, + BEnd: src.BEnd, + ThreePrimeAnchored: src.ThreePrimeAnchored, + BothThreePrimeAnchor: src.BothThreePrimeAnchor, + SegmentCount: src.SegmentCount, + BulgeCount: src.BulgeCount, + InternalLoopCount: src.InternalLoopCount, + DanglingEndCount: src.DanglingEndCount, + LoopPenaltyKcal: src.LoopPenaltyKcal, + BulgePenaltyKcal: src.BulgePenaltyKcal, + InternalLoopPenaltyKcal: src.InternalLoopPenaltyKcal, + StructureDanglingDeltaGKcal: src.StructureDanglingDeltaGKcal, + EnsembleDeltaGAtAnnealKcal: src.EnsembleDeltaGAtAnnealKcal, + PartitionFunction: src.PartitionFunction, + EnsembleWeight: src.EnsembleWeight, + EnsembleCandidateCount: src.EnsembleCandidateCount, + DPCellCount: src.DPCellCount, + DPStateCount: src.DPStateCount, + DPExpectedPairs: src.DPExpectedPairs, + DPMFEDeltaGAtAnnealKcal: src.DPMFEDeltaGAtAnnealKcal, + DPEnsembleDeltaGAtAnnealKcal: src.DPEnsembleDeltaGAtAnnealKcal, + PenaltyC: src.PenaltyC, + } +} + func toAPIProducts(list []engine.Product) []api.ProductV1 { out := make([]api.ProductV1, 0, len(list)) for _, p := range list { diff --git a/internal/output/rows.go b/internal/output/rows.go index a607c36..34cbc7a 100644 --- a/internal/output/rows.go +++ b/internal/output/rows.go @@ -33,3 +33,204 @@ func FormatRowTSVWithScore(p engine.Product) string { base := FormatBaseRowTSV(p) return fmt.Sprintf("%s\t%g", base, p.Score) } + +const ThermoDetailsTSVHeader = "thermo_model\tsalt_model\tna_m\tmg_m\tdntp_m\teffective_na_m\tfree_mg_m\tanneal_temp_c\tiupac_thermo_policy\tiupac_expansion_count\tiupac_expansion_capped\tiupac_effective_variant\tscore_profile\tbase_score_c\tfinal_score_c\tamplicon_adjustment_c\textension_logit\textension_bonus_c\tlength_penalty_c\tband_mass_bonus_c\tstructure_penalty_c\tlimiting_side\tfwd_tm_c\trev_tm_c\tfwd_margin_c\trev_margin_c\tfwd_dg_kcal\trev_dg_kcal\tfwd_mismatch_penalty_c\trev_mismatch_penalty_c\tfwd_mismatch_count\trev_mismatch_count\tfwd_3p_mismatch_count\trev_3p_mismatch_count\tfwd_mismatch_fallback_count\trev_mismatch_fallback_count\tfwd_mismatch_dg_kcal\trev_mismatch_dg_kcal\tfwd_terminal_mismatch_penalty_c\trev_terminal_mismatch_penalty_c\tfwd_5p_terminal_mismatch_penalty_c\trev_5p_terminal_mismatch_penalty_c\tfwd_3p_terminal_mismatch_penalty_c\trev_3p_terminal_mismatch_penalty_c\tfwd_terminal_mismatch_dg_kcal\trev_terminal_mismatch_dg_kcal\tfwd_dangling_end_adjustment_c\trev_dangling_end_adjustment_c\tfwd_dangling_end_dg_kcal\trev_dangling_end_dg_kcal\tfwd_end_effect_policy\trev_end_effect_policy\thairpin_penalty_c\tself_dimer_penalty_c\tcross_dimer_penalty_c\tpanel_cross_dimer_penalty_c\tpanel_cross_dimer_burden_c\tpanel_cross_dimer_count\tpanel_cross_dimer_partner\tprobe_found\tprobe_score_mode\tprobe_name\tprobe_seq\tprobe_strand\tprobe_pos\tprobe_mm\tprobe_site\tprobe_tm_c\tprobe_margin_c\tprobe_dg_kcal\tprobe_mismatch_penalty_c\tprobe_mismatch_dg_kcal\tprobe_iupac_thermo_policy\tprobe_iupac_expansion_count\tprobe_iupac_expansion_capped\tprobe_iupac_effective_variant\tprobe_score_contribution_c\tprobe_gate_penalty_c fwd_mismatch_policy rev_mismatch_policy fwd_mismatch_triplet_count rev_mismatch_triplet_count fwd_mismatch_curated_pair_count rev_mismatch_curated_pair_count fwd_mismatch_sources rev_mismatch_sources fwd_mismatch_parameter_sets rev_mismatch_parameter_sets fwd_mismatch_citations rev_mismatch_citations fwd_mismatch_parameter_notes rev_mismatch_parameter_notes probe_mismatch_count probe_mismatch_fallback_count probe_mismatch_triplet_count probe_mismatch_curated_pair_count probe_mismatch_policy probe_mismatch_sources probe_mismatch_parameter_sets probe_mismatch_citations probe_mismatch_parameter_notes fwd_terminal_mismatch_sources rev_terminal_mismatch_sources fwd_terminal_mismatch_parameter_sets rev_terminal_mismatch_parameter_sets fwd_terminal_mismatch_citations rev_terminal_mismatch_citations fwd_terminal_mismatch_parameter_notes rev_terminal_mismatch_parameter_notes probe_terminal_mismatch_sources probe_terminal_mismatch_parameter_sets probe_terminal_mismatch_citations probe_terminal_mismatch_parameter_notes" + +func thermoFloat(x float64) string { + return strconv.FormatFloat(x, 'g', -1, 64) +} + +func thermoStrings(values []string) string { + return strings.Join(values, "|") +} + +// FormatThermoDetailsTSV returns optional NN thermodynamic component columns. +// Legacy/heuristic rows emit empty fields so TSV width remains stable when +// --thermo-details is requested. +func FormatThermoDetailsTSV(p engine.Product) string { + fields := make([]string, len(strings.Split(ThermoDetailsTSVHeader, "\t"))) + if p.Thermo == nil { + return strings.Join(fields, "\t") + } + t := p.Thermo + fields[0] = t.Model + fields[1] = t.SaltModel + fields[2] = thermoFloat(t.NaM) + fields[3] = thermoFloat(t.MgM) + fields[4] = thermoFloat(t.DntpM) + fields[5] = thermoFloat(t.EffectiveNaM) + fields[6] = thermoFloat(t.FreeMgM) + fields[7] = thermoFloat(t.AnnealTempC) + fields[8] = t.IUPACThermoPolicy + if t.IUPACExpansionCount > 0 { + fields[9] = strconv.Itoa(t.IUPACExpansionCount) + } + if t.IUPACExpansionCapped { + fields[10] = "true" + } + fields[11] = t.IUPACEffectiveVariant + fields[12] = t.ScoreProfile + fields[13] = thermoFloat(t.BaseScoreC) + fields[14] = thermoFloat(t.ScoreC) + fields[15] = thermoFloat(t.AmpliconAdjustmentC) + fields[16] = thermoFloat(t.ExtensionLogit) + fields[17] = thermoFloat(t.ExtensionBonusC) + fields[18] = thermoFloat(t.LengthPenaltyC) + fields[19] = thermoFloat(t.BandMassBonusC) + fields[20] = thermoFloat(t.StructurePenaltyC) + fields[21] = t.LimitingSide + fields[22] = thermoFloat(t.Fwd.TmC) + fields[23] = thermoFloat(t.Rev.TmC) + fields[24] = thermoFloat(t.Fwd.AnnealMarginC) + fields[25] = thermoFloat(t.Rev.AnnealMarginC) + fields[26] = thermoFloat(t.Fwd.DeltaGAtAnnealKcal) + fields[27] = thermoFloat(t.Rev.DeltaGAtAnnealKcal) + fields[28] = thermoFloat(t.Fwd.MismatchPenaltyC) + fields[29] = thermoFloat(t.Rev.MismatchPenaltyC) + if t.Fwd.MismatchCount > 0 { + fields[30] = strconv.Itoa(t.Fwd.MismatchCount) + } + if t.Rev.MismatchCount > 0 { + fields[31] = strconv.Itoa(t.Rev.MismatchCount) + } + if t.Fwd.ThreePrimeMismatchCount > 0 { + fields[32] = strconv.Itoa(t.Fwd.ThreePrimeMismatchCount) + } + if t.Rev.ThreePrimeMismatchCount > 0 { + fields[33] = strconv.Itoa(t.Rev.ThreePrimeMismatchCount) + } + if t.Fwd.MismatchFallbackCount > 0 { + fields[34] = strconv.Itoa(t.Fwd.MismatchFallbackCount) + } + if t.Rev.MismatchFallbackCount > 0 { + fields[35] = strconv.Itoa(t.Rev.MismatchFallbackCount) + } + fields[36] = thermoFloat(t.Fwd.MismatchDeltaGKcal) + fields[37] = thermoFloat(t.Rev.MismatchDeltaGKcal) + fields[38] = thermoFloat(t.Fwd.TerminalMismatchPenaltyC) + fields[39] = thermoFloat(t.Rev.TerminalMismatchPenaltyC) + fields[40] = thermoFloat(t.Fwd.FivePrimeTerminalMismatchPenaltyC) + fields[41] = thermoFloat(t.Rev.FivePrimeTerminalMismatchPenaltyC) + fields[42] = thermoFloat(t.Fwd.ThreePrimeTerminalMismatchPenaltyC) + fields[43] = thermoFloat(t.Rev.ThreePrimeTerminalMismatchPenaltyC) + fields[44] = thermoFloat(t.Fwd.TerminalMismatchDeltaGKcal) + fields[45] = thermoFloat(t.Rev.TerminalMismatchDeltaGKcal) + fields[46] = thermoFloat(t.Fwd.DanglingEndAdjustmentC) + fields[47] = thermoFloat(t.Rev.DanglingEndAdjustmentC) + fields[48] = thermoFloat(t.Fwd.DanglingEndDeltaGKcal) + fields[49] = thermoFloat(t.Rev.DanglingEndDeltaGKcal) + fields[50] = t.Fwd.EndEffectPolicy + fields[51] = t.Rev.EndEffectPolicy + if t.WorstHairpin != nil { + fields[52] = thermoFloat(t.WorstHairpin.PenaltyC) + } + if t.WorstSelfDimer != nil { + fields[53] = thermoFloat(t.WorstSelfDimer.PenaltyC) + } + if t.CrossDimer != nil { + fields[54] = thermoFloat(t.CrossDimer.PenaltyC) + } + fields[55] = thermoFloat(t.PanelCrossDimerPenaltyC) + fields[56] = thermoFloat(t.PanelCrossDimerBurdenC) + if t.PanelCrossDimerCount > 0 { + fields[57] = strconv.Itoa(t.PanelCrossDimerCount) + } + if t.PanelCrossDimer != nil { + fields[58] = t.PanelCrossDimer.QueryA + "~" + t.PanelCrossDimer.QueryB + } + if t.Probe != nil { + if t.Probe.Found { + fields[59] = "true" + } else { + fields[59] = "false" + } + fields[60] = t.Probe.ScoreMode + fields[61] = t.Probe.Name + fields[62] = t.Probe.Seq + fields[63] = t.Probe.Strand + if t.Probe.Found { + fields[64] = strconv.Itoa(t.Probe.Pos) + fields[65] = strconv.Itoa(t.Probe.MM) + } + fields[66] = t.Probe.Site + fields[67] = thermoFloat(t.Probe.TmC) + fields[68] = thermoFloat(t.Probe.AnnealMarginC) + fields[69] = thermoFloat(t.Probe.DeltaGAtAnnealKcal) + fields[70] = thermoFloat(t.Probe.MismatchPenaltyC) + fields[71] = thermoFloat(t.Probe.MismatchDeltaGKcal) + fields[72] = t.Probe.IUPACThermoPolicy + if t.Probe.IUPACExpansionCount > 0 { + fields[73] = strconv.Itoa(t.Probe.IUPACExpansionCount) + } + if t.Probe.IUPACExpansionCapped { + fields[74] = "true" + } + fields[75] = t.Probe.IUPACEffectiveVariant + fields[76] = thermoFloat(t.Probe.ScoreContributionC) + fields[77] = thermoFloat(t.Probe.GatePenaltyC) + } + fields[78] = t.Fwd.MismatchPolicy + fields[79] = t.Rev.MismatchPolicy + if t.Fwd.MismatchTripletCount > 0 { + fields[80] = strconv.Itoa(t.Fwd.MismatchTripletCount) + } + if t.Rev.MismatchTripletCount > 0 { + fields[81] = strconv.Itoa(t.Rev.MismatchTripletCount) + } + if t.Fwd.MismatchCuratedPairCount > 0 { + fields[82] = strconv.Itoa(t.Fwd.MismatchCuratedPairCount) + } + if t.Rev.MismatchCuratedPairCount > 0 { + fields[83] = strconv.Itoa(t.Rev.MismatchCuratedPairCount) + } + fields[84] = thermoStrings(t.Fwd.MismatchSources) + fields[85] = thermoStrings(t.Rev.MismatchSources) + fields[86] = thermoStrings(t.Fwd.MismatchParameterSets) + fields[87] = thermoStrings(t.Rev.MismatchParameterSets) + fields[88] = thermoStrings(t.Fwd.MismatchCitations) + fields[89] = thermoStrings(t.Rev.MismatchCitations) + fields[90] = thermoStrings(t.Fwd.MismatchParameterNotes) + fields[91] = thermoStrings(t.Rev.MismatchParameterNotes) + if t.Probe != nil { + if t.Probe.MismatchCount > 0 { + fields[92] = strconv.Itoa(t.Probe.MismatchCount) + } + if t.Probe.MismatchFallbackCount > 0 { + fields[93] = strconv.Itoa(t.Probe.MismatchFallbackCount) + } + if t.Probe.MismatchTripletCount > 0 { + fields[94] = strconv.Itoa(t.Probe.MismatchTripletCount) + } + if t.Probe.MismatchCuratedPairCount > 0 { + fields[95] = strconv.Itoa(t.Probe.MismatchCuratedPairCount) + } + fields[96] = t.Probe.MismatchPolicy + fields[97] = thermoStrings(t.Probe.MismatchSources) + fields[98] = thermoStrings(t.Probe.MismatchParameterSets) + fields[99] = thermoStrings(t.Probe.MismatchCitations) + fields[100] = thermoStrings(t.Probe.MismatchParameterNotes) + } + fields[101] = thermoStrings(t.Fwd.TerminalMismatchSources) + fields[102] = thermoStrings(t.Rev.TerminalMismatchSources) + fields[103] = thermoStrings(t.Fwd.TerminalMismatchParameterSets) + fields[104] = thermoStrings(t.Rev.TerminalMismatchParameterSets) + fields[105] = thermoStrings(t.Fwd.TerminalMismatchCitations) + fields[106] = thermoStrings(t.Rev.TerminalMismatchCitations) + fields[107] = thermoStrings(t.Fwd.TerminalMismatchParameterNotes) + fields[108] = thermoStrings(t.Rev.TerminalMismatchParameterNotes) + if t.Probe != nil { + fields[109] = thermoStrings(t.Probe.TerminalMismatchSources) + fields[110] = thermoStrings(t.Probe.TerminalMismatchParameterSets) + fields[111] = thermoStrings(t.Probe.TerminalMismatchCitations) + fields[112] = thermoStrings(t.Probe.TerminalMismatchParameterNotes) + } + return strings.Join(fields, "\t") +} + +func FormatRowTSVWithThermoDetails(p engine.Product) string { + return FormatBaseRowTSV(p) + "\t" + FormatThermoDetailsTSV(p) +} + +func FormatRowTSVWithScoreAndThermoDetails(p engine.Product) string { + return FormatRowTSVWithScore(p) + "\t" + FormatThermoDetailsTSV(p) +} diff --git a/internal/thermoapp/app.go b/internal/thermoapp/app.go index 5a65313..b323cde 100644 --- a/internal/thermoapp/app.go +++ b/internal/thermoapp/app.go @@ -10,12 +10,13 @@ import ( "ipcr-core/engine" "ipcr-core/oligo" "ipcr-core/primer" - "ipcr-core/thermoaddons" + "ipcr-core/thermo" "ipcr/internal/appcore" "ipcr/internal/clibase" "ipcr/internal/cmdutil" "ipcr/internal/common" "ipcr/internal/thermocli" + "ipcr/internal/thermomodel" "ipcr/internal/thermovisitors" "ipcr/internal/version" "ipcr/internal/writers" @@ -105,26 +106,74 @@ func pairsFromOligos(oligs []primer.Oligo, minLen, maxLen int, includeSelf bool) return out } +func panelRefsFromOligos(oligs []primer.Oligo) []thermovisitors.PrimerRef { + out := make([]thermovisitors.PrimerRef, 0, len(oligs)) + for _, o := range oligs { + out = append(out, thermovisitors.PrimerRef{ID: o.ID, Seq: strings.ToUpper(o.Seq)}) + } + return out +} + +func panelRefsFromPairs(pairs []primer.Pair) []thermovisitors.PrimerRef { + out := make([]thermovisitors.PrimerRef, 0, len(pairs)*2) + for _, p := range pairs { + id := strings.TrimSpace(p.ID) + if id == "" { + id = "pair" + } + out = append(out, thermovisitors.PrimerRef{ID: id + ":fwd", Seq: strings.ToUpper(p.Forward)}) + out = append(out, thermovisitors.PrimerRef{ID: id + ":rev", Seq: strings.ToUpper(p.Reverse)}) + } + return out +} + // parseMolar: "250nM" → 2.5e-7; "50mM" → 5e-2 func parseMolar(s string) (float64, error) { - return thermoaddons.ParseConc(s) + return thermo.ParseConc(s) +} + +func isStrictACGTSeq(s string) bool { + if s == "" { + return false + } + for i := 0; i < len(s); i++ { + switch s[i] { + case 'A', 'C', 'G', 'T', 'a', 'c', 'g', 't': + default: + return false + } + } + return true +} + +func validateNNPrimers(mode thermomodel.Mode, iupacPolicy string, pairs []primer.Pair) error { + if iupacPolicy != thermo.IUPACThermoPolicyStrict { + return nil + } + for _, pair := range pairs { + if !isStrictACGTSeq(pair.Forward) || !isStrictACGTSeq(pair.Reverse) { + return fmt.Errorf("--thermo-model %s with --iupac-thermo-policy strict requires A/C/G/T primers; pair %q contains degenerate/IUPAC bases", mode, pair.ID) + } + } + return nil } /* ---------- writer (forces NeedSeq + score column + rank-by-score) ---------- */ type thermoWF struct { - Format string - Sort bool - Header bool - Pretty bool - IncludeScore bool - RankByScore bool + Format string + Sort bool + Header bool + Pretty bool + IncludeScore bool + RankByScore bool + ThermoDetails bool } func (w thermoWF) NeedSites() bool { return false } func (w thermoWF) NeedSeq() bool { return true } func (w thermoWF) Start(out io.Writer, bufSize int) (chan<- engine.Product, <-chan error) { - return writers.StartProductWriter(out, w.Format, w.Sort, w.Header, w.Pretty, w.IncludeScore, w.RankByScore, bufSize) + return writers.StartProductWriterWithThermoDetails(out, w.Format, w.Sort, w.Header, w.Pretty, w.IncludeScore, w.RankByScore, w.ThermoDetails, bufSize) } /* ----------------------------- main app ----------------------------- */ @@ -210,6 +259,7 @@ func RunContext(parent context.Context, argv []string, stdout, stderr io.Writer) } var pairs []primer.Pair + var panelRefs []thermovisitors.PrimerRef if hasOligoMode { var oligs []primer.Oligo if opts.OligosTSV != "" { @@ -233,6 +283,7 @@ func RunContext(parent context.Context, argv []string, stdout, stderr io.Writer) return 2 } pairs = pairsFromOligos(oligs, opts.MinLen, opts.MaxLen, opts.Self) + panelRefs = panelRefsFromOligos(oligs) if len(pairs) == 0 { _, _ = fmt.Fprintln(stderr, "error: need ≥2 oligos for pairing (or enable --self)") return 2 @@ -257,11 +308,13 @@ func RunContext(parent context.Context, argv []string, stdout, stderr io.Writer) if opts.Self { pairs = common.AddSelfPairsUnique(pairs) } + panelRefs = panelRefsFromPairs(pairs) } // Parse solution conditions (warn and default on errors) naM, errNa := parseMolar(opts.NaSpec) mgM, errMg := parseMolar(opts.MgSpec) + dntpM, errDntp := parseMolar(opts.DntpSpec) ctM, errCt := parseMolar(opts.PrimerConcSpec) if errNa != nil { cmdutil.Warnf(stderr, opts.Quiet, "bad --na %q: %v (using 50mM)", opts.NaSpec, errNa) @@ -271,22 +324,83 @@ func RunContext(parent context.Context, argv []string, stdout, stderr io.Writer) cmdutil.Warnf(stderr, opts.Quiet, "bad --mg %q: %v (using 3mM)", opts.MgSpec, errMg) mgM = 0.003 } + if errDntp != nil { + cmdutil.Warnf(stderr, opts.Quiet, "bad --dntp %q: %v (using 0mM)", opts.DntpSpec, errDntp) + dntpM = 0 + } if errCt != nil { cmdutil.Warnf(stderr, opts.Quiet, "bad --primer-conc %q: %v (using 250nM)", opts.PrimerConcSpec, errCt) ctM = 2.5e-7 } - // Effective monovalent (optional Owczarzy-lite via env) - naEff := thermoaddons.EffectiveMonovalent(naM, mgM) + saltModel, err := thermo.ParseSaltModel(opts.SaltModel) + if err != nil { + _, _ = fmt.Fprintln(stderr, err) + return 2 + } + conditions := thermo.Conditions{ + AnnealC: opts.AnnealTempC, + NaM: naM, + MgM: mgM, + DntpM: dntpM, + PrimerTotalM: ctM, + SaltModel: saltModel, + } + naEff := conditions.EffectiveNaM() + + mode, err := thermomodel.Parse(opts.ThermoModel) + if err != nil { + _, _ = fmt.Fprintln(stderr, err) + return 2 + } + if !mode.Implemented() { + _, _ = fmt.Fprintf(stderr, "--thermo-model %q is reserved for staged rollout but is not implemented yet; use %q\n", mode, thermomodel.LegacyHeuristic) + return 2 + } + if mode == thermomodel.LegacyHeuristic && strings.TrimSpace(opts.Probe) != "" && opts.ProbeThermo { + // Probe thermodynamics requires NN endpoint details. Preserve legacy behavior + // for runs without probes, but make an explicit --probe-thermo request useful + // even when the user leaves --thermo-model at its historical default. + mode = thermomodel.NNDuplexV1 + } + if mode == thermomodel.NNDuplexV1 || mode == thermomodel.NNStructureV1 { + if err := validateNNPrimers(mode, opts.IUPACThermoPolicy, pairs); err != nil { + _, _ = fmt.Fprintln(stderr, err) + return 2 + } + } // Build scorer (visitor) scorer := thermovisitors.Score{ - AnnealTempC: opts.AnnealTempC, - Na_M: naEff, - PrimerConc_M: ctM, - AllowIndels: opts.AllowIndel, - LengthBiasOn: false, // reserved; keep behavior stable - SingleStranded: opts.SingleStranded, + Model: mode, + Conditions: conditions, + AnnealTempC: opts.AnnealTempC, + Na_M: naEff, + PrimerConc_M: ctM, + AllowIndels: opts.AllowIndel, + LengthBiasOn: false, // reserved; keep behavior stable + SingleStranded: opts.SingleStranded, + StructHairpin: opts.StructHairpin, + StructDimer: opts.StructDimer, + StructScale: opts.StructScale, + PanelPrimers: panelRefs, + IUPACThermoPolicy: opts.IUPACThermoPolicy, + IUPACThermoMaxExpansions: opts.IUPACThermoMaxExpansions, + ScoreProfile: opts.ScoreProfile, + ExtAlpha: opts.ExtAlpha, + ExtWeight: opts.ExtWeight, + LenKneeBP: opts.LenKneeBP, + LenSteep: opts.LenSteep, + LenMaxPenC: opts.LenMaxPenC, + BindWeight: opts.BindWeight, + BandMassWeight: opts.BandMassWeight, + ProbeSeq: opts.Probe, + ProbeName: opts.ProbeName, + ProbeMaxMM: opts.ProbeMaxMM, + ProbeThermo: opts.ProbeThermo, + ProbeScoreMode: opts.ProbeScoreMode, + ProbeMinMarginC: opts.ProbeMinMarginC, + ProbeWeight: opts.ProbeWeight, // NEW: enable auto-denominator when requested UseAutoDenom: strings.ToLower(opts.DenomMode) == "auto", } @@ -315,12 +429,13 @@ func RunContext(parent context.Context, argv []string, stdout, stderr io.Writer) // Writer: always include score; rank-by-score if requested rankByScore := strings.ToLower(opts.Rank) != "coord" wf := thermoWF{ - Format: opts.Output, - Sort: true, - Header: opts.Header, - Pretty: opts.Pretty, - IncludeScore: true, - RankByScore: rankByScore, + Format: opts.Output, + Sort: true, + Header: opts.Header, + Pretty: opts.Pretty, + IncludeScore: true, + RankByScore: rankByScore, + ThermoDetails: opts.ThermoDetails, } return appcore.Run[engine.Product](parent, outw, stderr, coreOpts, pairs, scorer.Visit, wf) diff --git a/internal/thermocli/options.go b/internal/thermocli/options.go index 9c29b08..6602cc8 100644 --- a/internal/thermocli/options.go +++ b/internal/thermocli/options.go @@ -4,8 +4,10 @@ import ( "flag" "fmt" "io" + "ipcr-core/thermo" "ipcr/internal/clibase" "ipcr/internal/cliutil" + "ipcr/internal/thermomodel" "strings" ) @@ -26,13 +28,22 @@ type Options struct { AnnealTempC float64 NaSpec string MgSpec string + DntpSpec string PrimerConcSpec string + SaltModel string AllowIndel bool // NEW: ssDNA mode (BS-PCR) SingleStranded bool - // NEW: denominator mode for ΔΔG→ΔTm conversion: "fixed" (D=200) or "auto" + // Model mode for staged thermodynamic implementations. + ThermoModel string + + // Thermodynamic policy for degenerate/IUPAC primers in NN modes. + IUPACThermoPolicy string + IUPACThermoMaxExpansions int + + // Denominator mode for ΔΔG→ΔTm conversion: "fixed" (D=200) or "auto". DenomMode string // Oligo input @@ -40,24 +51,30 @@ type Options struct { OligosTSV string // Probe - Probe string - ProbeName string - ProbeMaxMM int - ProbeWeight float64 + Probe string + ProbeName string + ProbeMaxMM int + ProbeWeight float64 + ProbeThermo bool + ProbeScoreMode string + ProbeMinMarginC float64 // Ranking/output (NOTE: score is always included in ipcr-thermo) - Rank string + Rank string + ThermoDetails bool + ScoreProfile string // Thermo addons (thermo-only) - ExtAlpha float64 - LenKneeBP int - LenSteep float64 - LenMaxPenC float64 - StructHairpin bool - StructDimer bool - StructScale float64 - BindWeight float64 - ExtWeight float64 + ExtAlpha float64 + LenKneeBP int + LenSteep float64 + LenMaxPenC float64 + StructHairpin bool + StructDimer bool + StructScale float64 + BindWeight float64 + ExtWeight float64 + BandMassWeight float64 } func NewFlagSet(name string) *flag.FlagSet { @@ -77,23 +94,32 @@ func NewFlagSet(name string) *flag.FlagSet { _, _ = fmt.Fprintf(out, " --anneal-temp float Annealing temperature (°C) [%s]\n", "60") _, _ = fmt.Fprintf(out, " --na string Monovalent salt, e.g., 50mM [%s]\n", "50mM") _, _ = fmt.Fprintf(out, " --mg string Mg2+, e.g., 3mM [%s]\n", "3mM") + _, _ = fmt.Fprintf(out, " --dntp string Total dNTP, e.g., 200uM [%s]\n", "0mM") _, _ = fmt.Fprintf(out, " --primer-conc string Primer concentration, e.g., 250nM [%s]\n", "250nM") + _, _ = fmt.Fprintf(out, " --salt-model string Salt model: %s [%s]\n", thermo.KnownSaltModels(), thermo.SaltModelMonovalent) _, _ = fmt.Fprintln(out, " --allow-indel Allow a single 1-nt gap (bulge) per primer [false]") _, _ = fmt.Fprintln(out, " --single-stranded Treat target as ssDNA (BS-PCR): tiny dangling-end bonus + target-hairpin penalty [false]") - // NEW: + _, _ = fmt.Fprintf(out, " --thermo-model string Scoring model: %s [%s]\n", thermomodel.KnownList(), thermomodel.Default()) + _, _ = fmt.Fprintln(out, " --iupac-thermo-policy string Degenerate-primer NN policy: strict | worst | best | mean | enumerate [worst]") + _, _ = fmt.Fprintln(out, " --iupac-thermo-max-expansions int Max concrete primer-pair expansions [256]") _, _ = fmt.Fprintf(out, " --denom string ΔΔG→ΔTm denominator: fixed | auto [%s]\n", "fixed") _, _ = fmt.Fprintln(out, "\nProbe (optional):") _, _ = fmt.Fprintf(out, " --probe string Internal probe (5'→3') [%s]\n", "") _, _ = fmt.Fprintf(out, " --probe-name string Probe label [%s]\n", "probe") _, _ = fmt.Fprintf(out, " --probe-max-mm int Max probe mismatches allowed [%s]\n", "0") + _, _ = fmt.Fprintln(out, " --probe-thermo Score internal probe thermodynamics when --probe is supplied [true]") + _, _ = fmt.Fprintln(out, " --probe-score-mode string Probe thermo mode: annotate | gate | blend [gate]") + _, _ = fmt.Fprintf(out, " --probe-min-margin float Minimum probe annealing margin for gate mode (°C) [%s]\n", "0") _, _ = fmt.Fprintf(out, " --probe-weight float Blend [0..1]: (1=min of margins) [%s]\n", "1.0") _, _ = fmt.Fprintln(out, "\nThermo extensions (scoring only; thermo binary):") + _, _ = fmt.Fprintf(out, " --score-profile string Score profile: binding | pcr | gel [%s]\n", "binding") _, _ = fmt.Fprintf(out, " --ext-alpha float Slope for extension prob vs margin [%s]\n", "0.45") _, _ = fmt.Fprintf(out, " --length-knee-bp int Soft-knee start (bp) for length bias [%s]\n", "550") _, _ = fmt.Fprintf(out, " --length-steep float Soft-knee steepness [%s]\n", "0.003") _, _ = fmt.Fprintf(out, " --length-max-pen float Max °C-equivalent length penalty [%s]\n", "10") + _, _ = fmt.Fprintf(out, " --band-mass-weight float Gel profile bonus per 2× amplicon mass [%s]\n", "15") _, _ = fmt.Fprintln(out, " --struct-hairpin Penalize hairpins [true]") _, _ = fmt.Fprintln(out, " --struct-dimer Penalize primer-dimers [true]") _, _ = fmt.Fprintf(out, " --struct-scale float Structural penalties scale [%s]\n", "1.0") @@ -103,6 +129,7 @@ func NewFlagSet(name string) *flag.FlagSet { _, _ = fmt.Fprintln(out, "\nRanking & outputs (thermo):") _, _ = fmt.Fprintln(out, " score field is always included in outputs (TSV/JSON/JSONL).") _, _ = fmt.Fprintf(out, " --rank string Order by: score | coord [%s]\n", "score") + _, _ = fmt.Fprintln(out, " --thermo-details Add NN thermo component columns to text/TSV output [false]") _, _ = fmt.Fprintln(out, " (default is score; pass --rank coord to keep coordinate order.)") }) return fs @@ -135,28 +162,38 @@ func ParseArgs(fs *flag.FlagSet, argv []string) (Options, error) { fs.Float64Var(&o.AnnealTempC, "anneal-temp", 60, "annealing temperature (°C)") fs.StringVar(&o.NaSpec, "na", "50mM", "monovalent salt (e.g., 50mM)") fs.StringVar(&o.MgSpec, "mg", "3mM", "Mg2+ (e.g., 3mM)") + fs.StringVar(&o.DntpSpec, "dntp", "0mM", "total dNTP concentration (e.g., 200uM)") fs.StringVar(&o.PrimerConcSpec, "primer-conc", "250nM", "primer concentration (e.g., 250nM)") + fs.StringVar(&o.SaltModel, "salt-model", thermo.SaltModelMonovalent.String(), "salt model: "+thermo.KnownSaltModels()) fs.BoolVar(&o.AllowIndel, "allow-indel", false, "allow a single 1-nt gap (bulge) per primer") fs.BoolVar(&o.SingleStranded, "single-stranded", false, "target is ssDNA (BS-PCR mode)") - // NEW: denom mode + fs.StringVar(&o.ThermoModel, "thermo-model", thermomodel.Default().String(), "scoring model: "+thermomodel.KnownList()) + fs.StringVar(&o.IUPACThermoPolicy, "iupac-thermo-policy", thermo.IUPACThermoPolicyWorst, "degenerate-primer NN policy: strict | worst | best | mean | enumerate") + fs.IntVar(&o.IUPACThermoMaxExpansions, "iupac-thermo-max-expansions", 256, "max concrete primer-pair expansions") fs.StringVar(&o.DenomMode, "denom", "fixed", "ΔΔG→ΔTm denominator: fixed | auto") fs.StringVar(&o.Probe, "probe", "", "internal probe (5'→3') [optional]") fs.StringVar(&o.ProbeName, "probe-name", "probe", "probe label") fs.IntVar(&o.ProbeMaxMM, "probe-max-mm", 0, "max probe mismatches [0]") + fs.BoolVar(&o.ProbeThermo, "probe-thermo", true, "score internal probe thermodynamics when --probe is supplied; implies nn-duplex-v1 when the thermo model is left at legacy default") + fs.StringVar(&o.ProbeScoreMode, "probe-score-mode", "gate", "probe thermo mode: annotate | gate | blend") + fs.Float64Var(&o.ProbeMinMarginC, "probe-min-margin", 0, "minimum probe annealing margin for gate mode (°C)") fs.Float64Var(&o.ProbeWeight, "probe-weight", 1.0, "blend [0..1]: 1 favors probe strongly") fs.StringVar(&o.Rank, "rank", "score", "order by: score | coord") + fs.BoolVar(&o.ThermoDetails, "thermo-details", false, "add NN thermo component columns to text/TSV output") fs.BoolVar(&help, "h", false, "show this help [false]") fs.BoolVar(&showExamples, "examples", false, "show quickstart examples and exit [false]") // Thermo addons (with defaults) + fs.StringVar(&o.ScoreProfile, "score-profile", "binding", "score profile: binding | pcr | gel") fs.Float64Var(&o.ExtAlpha, "ext-alpha", 0.45, "slope for extension prob vs margin") fs.IntVar(&o.LenKneeBP, "length-knee-bp", 550, "soft-knee start (bp)") fs.Float64Var(&o.LenSteep, "length-steep", 0.003, "soft-knee steepness") fs.Float64Var(&o.LenMaxPenC, "length-max-pen", 10, "max length penalty (°C)") + fs.Float64Var(&o.BandMassWeight, "band-mass-weight", 15, "gel profile band-mass bonus per 2x amplicon length") fs.BoolVar(&o.StructHairpin, "struct-hairpin", true, "penalize hairpins") fs.BoolVar(&o.StructDimer, "struct-dimer", true, "penalize primer-dimers") fs.Float64Var(&o.StructScale, "struct-scale", 1.0, "scale for structural penalties") @@ -198,12 +235,65 @@ func ParseArgs(fs *flag.FlagSet, argv []string) (Options, error) { default: return o, fmt.Errorf("--rank must be 'coord' or 'score'") } - // NEW: validate denom + mode, err := thermomodel.Parse(o.ThermoModel) + if err != nil { + return o, err + } + o.ThermoModel = mode.String() + if !mode.Implemented() { + return o, fmt.Errorf("--thermo-model %q is reserved for staged rollout but is not implemented yet; use %q", mode, thermomodel.LegacyHeuristic) + } + + modeSalt, err := thermo.ParseSaltModel(o.SaltModel) + if err != nil { + return o, err + } + o.SaltModel = modeSalt.String() + + policy, err := thermo.ParseIUPACThermoPolicy(o.IUPACThermoPolicy) + if err != nil { + return o, err + } + o.IUPACThermoPolicy = policy + if o.IUPACThermoMaxExpansions < 1 { + return o, fmt.Errorf("--iupac-thermo-max-expansions must be >= 1") + } + switch strings.ToLower(o.DenomMode) { case "fixed", "auto": default: return o, fmt.Errorf("--denom must be 'fixed' or 'auto'") } + switch strings.ToLower(o.ScoreProfile) { + case "binding", "pcr", "gel": + o.ScoreProfile = strings.ToLower(o.ScoreProfile) + default: + return o, fmt.Errorf("--score-profile must be 'binding', 'pcr', or 'gel'") + } + if o.ExtAlpha < 0 { + return o, fmt.Errorf("--ext-alpha must be >= 0") + } + if o.LenKneeBP < 0 { + return o, fmt.Errorf("--length-knee-bp must be >= 0") + } + if o.LenSteep < 0 { + return o, fmt.Errorf("--length-steep must be >= 0") + } + if o.LenMaxPenC < 0 { + return o, fmt.Errorf("--length-max-pen must be >= 0") + } + if o.BandMassWeight < 0 { + return o, fmt.Errorf("--band-mass-weight must be >= 0") + } + switch strings.ToLower(o.ProbeScoreMode) { + case "annotate", "gate", "blend": + o.ProbeScoreMode = strings.ToLower(o.ProbeScoreMode) + default: + return o, fmt.Errorf("--probe-score-mode must be 'annotate', 'gate', or 'blend'") + } + if o.ProbeMinMarginC < -100 || o.ProbeMinMarginC > 100 { + return o, fmt.Errorf("--probe-min-margin must be within [-100,100]") + } if o.ProbeWeight < 0 || o.ProbeWeight > 1 { return o, fmt.Errorf("--probe-weight must be in [0,1]") } diff --git a/internal/thermocli/options_test.go b/internal/thermocli/options_test.go new file mode 100644 index 0000000..bf66e6d --- /dev/null +++ b/internal/thermocli/options_test.go @@ -0,0 +1,218 @@ +package thermocli + +import ( + "flag" + "io" + "ipcr-core/thermo" + "ipcr/internal/thermomodel" + "strings" + "testing" +) + +func parseArgsForTest(args ...string) (Options, error) { + fs := NewFlagSet("ipcr-thermo") + fs.SetOutput(io.Discard) + return ParseArgs(fs, args) +} + +func minimalArgs() []string { + return []string{"--forward", "ACGT", "--reverse", "ACGT", "--sequences", "ref.fa"} +} + +func TestParseArgs_DefaultThermoModelIsLegacyHeuristic(t *testing.T) { + opts, err := parseArgsForTest(minimalArgs()...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if opts.ThermoModel != thermomodel.LegacyHeuristic.String() { + t.Fatalf("got model %q, want %q", opts.ThermoModel, thermomodel.LegacyHeuristic) + } +} + +func TestParseArgs_ExplicitLegacyThermoModel(t *testing.T) { + args := append(minimalArgs(), "--thermo-model", "legacy-heuristic") + opts, err := parseArgsForTest(args...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if opts.ThermoModel != thermomodel.LegacyHeuristic.String() { + t.Fatalf("got model %q, want %q", opts.ThermoModel, thermomodel.LegacyHeuristic) + } +} + +func TestParseArgs_UnknownThermoModelRejected(t *testing.T) { + args := append(minimalArgs(), "--thermo-model", "bogus") + _, err := parseArgsForTest(args...) + if err == nil { + t.Fatal("expected unknown model error") + } + if !strings.Contains(err.Error(), "unknown thermo model") { + t.Fatalf("unexpected error: %v", err) + } +} + +func TestParseArgs_NNDuplexThermoModelAccepted(t *testing.T) { + args := append(minimalArgs(), "--thermo-model", thermomodel.NNDuplexV1.String()) + opts, err := parseArgsForTest(args...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if opts.ThermoModel != thermomodel.NNDuplexV1.String() { + t.Fatalf("got model %q, want %q", opts.ThermoModel, thermomodel.NNDuplexV1) + } +} + +func TestParseArgs_NNStructureThermoModelAccepted(t *testing.T) { + args := append(minimalArgs(), "--thermo-model", thermomodel.NNStructureV1.String()) + opts, err := parseArgsForTest(args...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if opts.ThermoModel != thermomodel.NNStructureV1.String() { + t.Fatalf("got model %q, want %q", opts.ThermoModel, thermomodel.NNStructureV1) + } +} + +func TestParseArgs_HelpShowsThermoModelFlag(t *testing.T) { + fs := NewFlagSet("ipcr-thermo") + fs.SetOutput(io.Discard) + if _, err := ParseArgs(fs, []string{"-h"}); err != flag.ErrHelp { + t.Fatalf("expected flag.ErrHelp, got %v", err) + } + found := false + fs.VisitAll(func(f *flag.Flag) { + if f.Name == "thermo-model" { + found = true + } + }) + if !found { + t.Fatal("expected --thermo-model flag to be registered") + } +} + +func TestParseArgs_DefaultSaltModelIsMonovalent(t *testing.T) { + opts, err := parseArgsForTest(minimalArgs()...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if opts.SaltModel != thermo.SaltModelMonovalent.String() { + t.Fatalf("got salt model %q, want %q", opts.SaltModel, thermo.SaltModelMonovalent) + } +} + +func TestParseArgs_UnknownSaltModelRejected(t *testing.T) { + args := append(minimalArgs(), "--salt-model", "hidden-env") + _, err := parseArgsForTest(args...) + if err == nil { + t.Fatal("expected unknown salt model error") + } + if !strings.Contains(err.Error(), "unknown salt model") { + t.Fatalf("unexpected error: %v", err) + } +} + +func TestParseArgs_ThermoDetailsFlag(t *testing.T) { + args := append(minimalArgs(), "--thermo-details") + opts, err := parseArgsForTest(args...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if !opts.ThermoDetails { + t.Fatal("expected --thermo-details to be enabled") + } +} + +func TestParseArgs_Owczarzy08SaltModelAccepted(t *testing.T) { + args := append(minimalArgs(), "--salt-model", thermo.SaltModelOwczarzy08.String()) + opts, err := parseArgsForTest(args...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if opts.SaltModel != thermo.SaltModelOwczarzy08.String() { + t.Fatalf("got salt model %q, want %q", opts.SaltModel, thermo.SaltModelOwczarzy08) + } +} + +func TestParseArgs_DNTPFlag(t *testing.T) { + args := append(minimalArgs(), "--dntp", "800uM") + opts, err := parseArgsForTest(args...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if opts.DntpSpec != "800uM" { + t.Fatalf("got dNTP spec %q, want 800uM", opts.DntpSpec) + } +} + +func TestParseArgs_DefaultIUPACThermoPolicyIsWorst(t *testing.T) { + opts, err := parseArgsForTest(minimalArgs()...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if opts.IUPACThermoPolicy != thermo.IUPACThermoPolicyWorst { + t.Fatalf("got policy %q, want %q", opts.IUPACThermoPolicy, thermo.IUPACThermoPolicyWorst) + } + if opts.IUPACThermoMaxExpansions != 256 { + t.Fatalf("got max expansions %d, want 256", opts.IUPACThermoMaxExpansions) + } +} + +func TestParseArgs_IUPACThermoPolicyAndCap(t *testing.T) { + args := append(minimalArgs(), "--iupac-thermo-policy", "enumerate", "--iupac-thermo-max-expansions", "17") + opts, err := parseArgsForTest(args...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if opts.IUPACThermoPolicy != thermo.IUPACThermoPolicyEnumerate || opts.IUPACThermoMaxExpansions != 17 { + t.Fatalf("unexpected IUPAC policy/cap: %+v", opts) + } +} + +func TestParseArgs_IUPACThermoPolicyRejectsInvalid(t *testing.T) { + args := append(minimalArgs(), "--iupac-thermo-policy", "median") + _, err := parseArgsForTest(args...) + if err == nil { + t.Fatal("expected invalid IUPAC thermo policy error") + } + if !strings.Contains(err.Error(), "unknown IUPAC thermo policy") { + t.Fatalf("unexpected error: %v", err) + } +} + +func TestParseArgs_DefaultProbeScoreModeIsGate(t *testing.T) { + opts, err := parseArgsForTest(minimalArgs()...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if !opts.ProbeThermo { + t.Fatal("expected probe thermodynamics to be enabled by default") + } + if opts.ProbeScoreMode != "gate" { + t.Fatalf("got probe score mode %q, want gate", opts.ProbeScoreMode) + } +} + +func TestParseArgs_ProbeScoreModeRejectsInvalid(t *testing.T) { + args := append(minimalArgs(), "--probe-score-mode", "median") + _, err := parseArgsForTest(args...) + if err == nil { + t.Fatal("expected probe score mode error") + } + if !strings.Contains(err.Error(), "--probe-score-mode") { + t.Fatalf("unexpected error: %v", err) + } +} + +func TestParseArgs_ProbeThermoCanBeDisabled(t *testing.T) { + args := append(minimalArgs(), "--probe-thermo=false", "--probe-score-mode", "annotate", "--probe-min-margin", "2.5") + opts, err := parseArgsForTest(args...) + if err != nil { + t.Fatalf("ParseArgs returned error: %v", err) + } + if opts.ProbeThermo { + t.Fatal("expected probe thermodynamics to be disabled") + } + if opts.ProbeScoreMode != "annotate" || opts.ProbeMinMarginC != 2.5 { + t.Fatalf("unexpected probe options: %+v", opts) + } +} diff --git a/internal/thermointegration/denom_subtests_test.go b/internal/thermointegration/denom_subtests_test.go index ec2cc9f..56c784c 100644 --- a/internal/thermointegration/denom_subtests_test.go +++ b/internal/thermointegration/denom_subtests_test.go @@ -112,3 +112,138 @@ func TestThermo_DenomMode_Subtests(t *testing.T) { } }) } + +func TestThermo_LegacyModelGoldenOutput(t *testing.T) { + fa := writeFA2(t, "thermo_stage0.fa", ">s\nACGTACAAAAAAGGTACC\n") + t.Cleanup(func() { _ = os.Remove(fa) }) + + baseArgs := []string{ + "--forward", "AAGTAC", + "--reverse", "GGTACC", + "--sequences", fa, + "--output", "text", + "--sort", + "--rank", "score", + "--mismatches", "1", + "--seed-length", "3", + "--terminal-window", "0", + "--self=false", + } + + want := "source_file\tsequence_id\texperiment_id\tstart\tend\tlength\ttype\tfwd_mm\trev_mm\tfwd_mm_i\trev_mm_i\tscore\n" + + "thermo_stage0.fa\ts\tmanual\t0\t18\t18\tforward\t1\t0\t1\t\t-18.975\n" + + "thermo_stage0.fa\ts\tmanual\t11\t18\t7\tforward\t1\t0\t1\t\t-29.625\n" + + gotDefault := runThermo(t, baseArgs) + if gotDefault != want { + t.Fatalf("legacy default output changed:\ngot:\n%s\nwant:\n%s", gotDefault, want) + } + + gotExplicit := runThermo(t, append(append([]string{}, baseArgs...), "--thermo-model", "legacy-heuristic")) + if gotExplicit != want { + t.Fatalf("explicit legacy output changed:\ngot:\n%s\nwant:\n%s", gotExplicit, want) + } +} + +func TestThermo_NNDuplexModelAcceptedAndTemperatureAware(t *testing.T) { + fa := writeFA2(t, "thermo_nn_duplex.fa", ">s\nACGTACGTACGTACGTACGTAAAAAAACGTACGTACGTACGTACGT\n") + t.Cleanup(func() { _ = os.Remove(fa) }) + + baseArgs := []string{ + "--forward", "ACGTACGTACGTACGTACGT", + "--reverse", "ACGTACGTACGTACGTACGT", + "--sequences", fa, + "--output", "text", + "--rank", "score", + "--self=false", + "--thermo-model", "nn-duplex-v1", + "--iupac-thermo-policy", "strict", + } + + lowAnneal := firstScoreFromTSV(t, runThermo(t, append(append([]string{}, baseArgs...), "--anneal-temp", "55"))) + highAnneal := firstScoreFromTSV(t, runThermo(t, append(append([]string{}, baseArgs...), "--anneal-temp", "70"))) + + if !(lowAnneal > highAnneal) { + t.Fatalf("expected lower anneal temp to improve NN margin score; 55C=%g 70C=%g", lowAnneal, highAnneal) + } +} + +func TestThermo_NNDuplexJSONIncludesThermoComponents(t *testing.T) { + fa := writeFA2(t, "thermo_nn_duplex_json.fa", ">s\nACGTACGTACGTACGTACGTAAAAAAACGTACGTACGTACGTACGT\n") + t.Cleanup(func() { _ = os.Remove(fa) }) + + out := runThermo(t, []string{ + "--forward", "ACGTACGTACGTACGTACGT", + "--reverse", "ACGTACGTACGTACGTACGT", + "--sequences", fa, + "--output", "json", + "--sort", + "--self=false", + "--thermo-model", "nn-duplex-v1", + }) + for _, want := range []string{`"thermo"`, `"model": "nn-duplex-v1"`, `"anneal_margin_c"`, `"delta_g_at_anneal_kcal"`, `"dangling_end_adjustment_c"`} { + if !strings.Contains(out, want) { + t.Fatalf("expected JSON output to contain %s; got:\n%s", want, out) + } + } +} + +func TestThermo_NNDuplexModelRejectsDegeneratePrimers(t *testing.T) { + fa := writeFA2(t, "thermo_nn_duplex_iupac.fa", ">s\nACGTACAAAAAAGGTACC\n") + t.Cleanup(func() { _ = os.Remove(fa) }) + + var out, errB bytes.Buffer + code := thermoapp.Run([]string{ + "--forward", "ACGRAC", + "--reverse", "GGTACC", + "--sequences", fa, + "--thermo-model", "nn-duplex-v1", + "--iupac-thermo-policy", "strict", + }, &out, &errB) + if code != 2 { + t.Fatalf("expected exit 2 for strict NN IUPAC policy, got %d stdout=%q stderr=%q", code, out.String(), errB.String()) + } + if !strings.Contains(errB.String(), "strict requires A/C/G/T") { + t.Fatalf("unexpected stderr: %q", errB.String()) + } +} + +func rcForIntegration(s string) string { + b := make([]byte, len(s)) + for i := 0; i < len(s); i++ { + switch s[i] { + case 'A': + b[len(s)-1-i] = 'T' + case 'C': + b[len(s)-1-i] = 'G' + case 'G': + b[len(s)-1-i] = 'C' + case 'T': + b[len(s)-1-i] = 'A' + default: + b[len(s)-1-i] = 'N' + } + } + return string(b) +} + +func TestThermo_NNStructureModelAcceptedAndReportsDimers(t *testing.T) { + fwd := "GCGCGCGC" + rev := "GCGCGCGC" + fa := writeFA2(t, "thermo_structure_model.fa", ">s\n"+fwd+"AAAA"+rcForIntegration(rev)+"\n") + t.Cleanup(func() { _ = os.Remove(fa) }) + + out := runThermo(t, []string{ + "--forward", fwd, + "--reverse", rev, + "--sequences", fa, + "--output", "json", + "--self=false", + "--thermo-model", "nn-structure-v1", + }) + for _, want := range []string{`"model": "nn-structure-v1"`, `"structure_penalty_c"`, `"cross_dimer"`} { + if !strings.Contains(out, want) { + t.Fatalf("expected JSON output to contain %s; got:\n%s", want, out) + } + } +} diff --git a/internal/thermointegration/integration_test.go b/internal/thermointegration/integration_test.go index e16566a..9d8edb3 100644 --- a/internal/thermointegration/integration_test.go +++ b/internal/thermointegration/integration_test.go @@ -4,6 +4,7 @@ import ( "bytes" "ipcr/internal/thermoapp" "os" + "strings" "testing" ) @@ -40,3 +41,212 @@ func TestThermo_EndToEnd_TSVWithScore(t *testing.T) { t.Fatalf("expected 'score' in header:\n%s", s) } } + +func TestThermo_EndToEnd_TSVWithThermoDetails(t *testing.T) { + fa := writeFA(t, "thermo_details_it.fa", ">s\nACGTACGTACGTACGTACGTAAAAACGTACGTACGTACGTACGT\n") + defer func() { _ = os.Remove(fa) }() + + var out, errB bytes.Buffer + code := thermoapp.Run([]string{ + "--forward", "ACGTACGTACGTACGTACGT", + "--reverse", "ACGTACGTACGTACGTACGT", + "--sequences", fa, + "--output", "text", + "--thermo-model", "nn-duplex-v1", + "--thermo-details", + }, &out, &errB) + if code != 0 { + t.Fatalf("exit %d err=%s", code, errB.String()) + } + s := out.String() + if !bytes.Contains(out.Bytes(), []byte("thermo_model\tsalt_model\tna_m\tmg_m\tdntp_m\teffective_na_m\tfree_mg_m\tanneal_temp_c")) { + t.Fatalf("expected thermo details header:\n%s", s) + } + if !bytes.Contains(out.Bytes(), []byte("nn-duplex-v1")) { + t.Fatalf("expected nn-duplex-v1 detail row:\n%s", s) + } +} + +func rc5to3IT(s string) string { + out := make([]byte, len(s)) + for i := range s { + switch s[i] { + case 'A': + out[len(s)-1-i] = 'T' + case 'C': + out[len(s)-1-i] = 'G' + case 'G': + out[len(s)-1-i] = 'C' + case 'T': + out[len(s)-1-i] = 'A' + default: + out[len(s)-1-i] = 'N' + } + } + return string(out) +} + +func tsvColumnValue(t *testing.T, text, column string) string { + t.Helper() + lines := strings.Split(strings.TrimSpace(text), "\n") + if len(lines) < 2 { + t.Fatalf("expected header and row, got %q", text) + } + head := strings.Split(lines[0], "\t") + row := strings.Split(lines[1], "\t") + for i, h := range head { + if h == column { + if i >= len(row) { + t.Fatalf("column %q index %d beyond row width %d", column, i, len(row)) + } + return row[i] + } + } + t.Fatalf("missing column %q in header %q", column, lines[0]) + return "" +} + +func TestThermo_MismatchProvenanceAppearsInTSVJSONAndJSONL(t *testing.T) { + fwd := "ACGTACGTACGTACGTACGT" + rev := "TTGCGTATCGATCGTACGTA" + leftSite := []byte(fwd) + leftSite[6] = 'A' // G/T internal mismatch in the primer-template duplex. + fa := writeFA(t, "thermo_mismatch_provenance.fa", ">s\n"+string(leftSite)+"AAAA"+rc5to3IT(rev)+"\n") + defer func() { _ = os.Remove(fa) }() + + baseArgs := []string{ + "--forward", fwd, + "--reverse", rev, + "--sequences", fa, + "--mismatches", "1", + "--thermo-model", "nn-duplex-v1", + } + + var tsvOut, tsvErr bytes.Buffer + code := thermoapp.Run(append(append([]string{}, baseArgs...), "--output", "text", "--thermo-details"), &tsvOut, &tsvErr) + if code != 0 { + t.Fatalf("tsv exit %d err=%s", code, tsvErr.String()) + } + tsv := tsvOut.String() + if got := tsvColumnValue(t, tsv, "fwd_mismatch_sources"); got != "triplet-ddg" { + t.Fatalf("fwd_mismatch_sources: got %q\n%s", got, tsv) + } + if got := tsvColumnValue(t, tsv, "fwd_mismatch_parameter_sets"); got != "santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1" { + t.Fatalf("fwd_mismatch_parameter_sets: got %q\n%s", got, tsv) + } + if got := tsvColumnValue(t, tsv, "fwd_mismatch_citations"); !strings.Contains(got, "SantaLucia & Hicks 2004") { + t.Fatalf("fwd_mismatch_citations missing citation: got %q\n%s", got, tsv) + } + + for _, format := range []string{"json", "jsonl"} { + var out, errB bytes.Buffer + code := thermoapp.Run(append(append([]string{}, baseArgs...), "--output", format), &out, &errB) + if code != 0 { + t.Fatalf("%s exit %d err=%s", format, code, errB.String()) + } + if !bytes.Contains(out.Bytes(), []byte("mismatch_parameter_sets")) || + !bytes.Contains(out.Bytes(), []byte("santalucia-hicks-2004-internal-mismatch-compiled-dimer-gauge-v1")) || + !bytes.Contains(out.Bytes(), []byte("mismatch_sources")) { + t.Fatalf("expected mismatch provenance in %s output:\n%s", format, out.String()) + } + } +} + +func TestThermo_TerminalMismatchProvenanceAppearsInTSVJSONAndJSONL(t *testing.T) { + fwd := "ACGTACGTACGTACGTACGT" + rev := "TTGCGTATCGATCGTACGTA" + leftSite := []byte(fwd) + leftSite[len(leftSite)-1] = 'A' // primer 3' terminal T/T mismatch after target-orientation conversion. + fa := writeFA(t, "thermo_terminal_mismatch_provenance.fa", ">s\n"+string(leftSite)+"AAAA"+rc5to3IT(rev)+"\n") + defer func() { _ = os.Remove(fa) }() + + baseArgs := []string{ + "--forward", fwd, + "--reverse", rev, + "--sequences", fa, + "--mismatches", "1", + "--terminal-window", "0", + "--seed-length", "-1", + "--thermo-model", "nn-duplex-v1", + } + + var tsvOut, tsvErr bytes.Buffer + code := thermoapp.Run(append(append([]string{}, baseArgs...), "--output", "text", "--thermo-details"), &tsvOut, &tsvErr) + if code != 0 { + t.Fatalf("tsv exit %d err=%s", code, tsvErr.String()) + } + tsv := tsvOut.String() + if got := tsvColumnValue(t, tsv, "fwd_terminal_mismatch_sources"); got != "ipcr-terminal-mismatch-heuristic" { + t.Fatalf("fwd_terminal_mismatch_sources: got %q\n%s", got, tsv) + } + if got := tsvColumnValue(t, tsv, "fwd_terminal_mismatch_parameter_sets"); got != "ipcr-terminal-mismatch-heuristic-v1" { + t.Fatalf("fwd_terminal_mismatch_parameter_sets: got %q\n%s", got, tsv) + } + if got := tsvColumnValue(t, tsv, "fwd_terminal_mismatch_citations"); !strings.Contains(got, "ipcr internal terminal-mismatch heuristic") { + t.Fatalf("fwd_terminal_mismatch_citations missing heuristic citation: got %q\n%s", got, tsv) + } + if got := tsvColumnValue(t, tsv, "fwd_terminal_mismatch_parameter_notes"); !strings.Contains(got, "Empirical fixed terminal mismatch") { + t.Fatalf("fwd_terminal_mismatch_parameter_notes missing heuristic note: got %q\n%s", got, tsv) + } + if got := tsvColumnValue(t, tsv, "fwd_terminal_mismatch_penalty_c"); got == "" || got == "0" { + t.Fatalf("expected nonzero fwd_terminal_mismatch_penalty_c, got %q\n%s", got, tsv) + } + if got := tsvColumnValue(t, tsv, "fwd_3p_terminal_mismatch_penalty_c"); got == "" || got == "0" { + t.Fatalf("expected nonzero fwd_3p_terminal_mismatch_penalty_c, got %q\n%s", got, tsv) + } + if got := tsvColumnValue(t, tsv, "fwd_terminal_mismatch_dg_kcal"); got == "" || got == "0" { + t.Fatalf("expected nonzero fwd_terminal_mismatch_dg_kcal, got %q\n%s", got, tsv) + } + + for _, format := range []string{"json", "jsonl"} { + var out, errB bytes.Buffer + code := thermoapp.Run(append(append([]string{}, baseArgs...), "--output", format), &out, &errB) + if code != 0 { + t.Fatalf("%s exit %d err=%s", format, code, errB.String()) + } + for _, want := range [][]byte{ + []byte("terminal_mismatch_sources"), + []byte("ipcr-terminal-mismatch-heuristic"), + []byte("terminal_mismatch_parameter_sets"), + []byte("ipcr-terminal-mismatch-heuristic-v1"), + []byte("terminal_mismatch_citations"), + []byte("terminal_mismatch_parameter_notes"), + []byte("terminal_mismatch_count"), + []byte("terminal_mismatch_penalty_c"), + } { + if !bytes.Contains(out.Bytes(), want) { + t.Fatalf("expected %q in %s output:\n%s", want, format, out.String()) + } + } + } +} + +func TestThermo_ProbeThermoDefaultModelAutoPromotesToNNDuplex(t *testing.T) { + fwd := "GCGCGCGCGCGCGCGCGCGC" + rev := "CGCGCGCGCGCGCGCGCGCG" + probe := "GCGCGATCGCGATCGCGCGC" + fa := writeFA(t, "thermo_probe_auto_nn.fa", ">s\n"+fwd+"AAAA"+probe+"AAAA"+rc5to3IT(rev)+"\n") + defer func() { _ = os.Remove(fa) }() + + var out, errB bytes.Buffer + code := thermoapp.Run([]string{ + "--forward", fwd, + "--reverse", rev, + "--sequences", fa, + "--probe", probe, + "--probe-thermo", + "--probe-min-margin", "-100", + "--output", "text", + "--thermo-details", + }, &out, &errB) + if code != 0 { + t.Fatalf("exit %d err=%s", code, errB.String()) + } + s := out.String() + if !bytes.Contains(out.Bytes(), []byte("nn-duplex-v1")) { + t.Fatalf("expected implicit nn-duplex-v1 scoring when --probe-thermo is used with default model:\n%s", s) + } + if !bytes.Contains(out.Bytes(), []byte("probe_found")) || !bytes.Contains(out.Bytes(), []byte("\ttrue\tgate\tprobe\t")) { + t.Fatalf("expected populated probe thermo detail columns:\n%s", s) + } +} diff --git a/internal/thermointegration/thermo_release_docs_test.go b/internal/thermointegration/thermo_release_docs_test.go new file mode 100644 index 0000000..2cf8d5a --- /dev/null +++ b/internal/thermointegration/thermo_release_docs_test.go @@ -0,0 +1,37 @@ +package thermointegration + +import ( + "os" + "path/filepath" + "runtime" + "strings" + "testing" +) + +func TestThermoReleaseDocsDeclareModelBoundaries(t *testing.T) { + _, file, _, ok := runtime.Caller(0) + if !ok { + t.Fatal("runtime.Caller failed") + } + repoRoot := filepath.Clean(filepath.Join(filepath.Dir(file), "../..")) + data, err := os.ReadFile(filepath.Join(repoRoot, "docs", "THERMO_MODELS.md")) + if err != nil { + t.Fatalf("read thermo model docs: %v", err) + } + doc := string(data) + for _, want := range []string{ + "PCR kinetics simulator", + "nn-duplex-v1", + "nn-structure-v1", + "binding", + "pcr", + "gel", + "owczarzy08", + "IUPAC thermodynamics policy", + "Modified probes are not fully modeled", + } { + if !strings.Contains(doc, want) { + t.Fatalf("thermo docs missing %q", want) + } + } +} diff --git a/internal/thermomodel/model.go b/internal/thermomodel/model.go new file mode 100644 index 0000000..b3e426b --- /dev/null +++ b/internal/thermomodel/model.go @@ -0,0 +1,66 @@ +package thermomodel + +import ( + "fmt" + "strings" +) + +// Mode identifies the scoring model used by ipcr-thermo. The explicit mode is +// intentionally separate from lower-level knobs such as --denom so future +// thermodynamic implementations can be introduced without changing legacy +// behavior by accident. +type Mode string + +const ( + // LegacyHeuristic is the current shipped behavior: heuristic primer-template + // mismatch scoring with the existing fixed/auto denominator switch. + LegacyHeuristic Mode = "legacy-heuristic" + + // NNDuplexV1 is the nearest-neighbor primer-template duplex implementation. + // It computes condition-aware perfect-duplex thermodynamics and applies the + // explicit mismatch fallback policy when the target site is imperfect. + NNDuplexV1 Mode = "nn-duplex-v1" + + // NNStructureV1 adds nearest-neighbor secondary-structure competition terms: + // primer hairpins, self-dimers, and forward/reverse cross-dimers. + NNStructureV1 Mode = "nn-structure-v1" +) + +// Default returns the behavior-preserving model. +func Default() Mode { return LegacyHeuristic } + +// Parse validates and normalizes a model name. +func Parse(raw string) (Mode, error) { + s := strings.TrimSpace(strings.ToLower(raw)) + if s == "" { + return Default(), nil + } + switch Mode(s) { + case LegacyHeuristic, NNDuplexV1, NNStructureV1: + return Mode(s), nil + default: + return "", fmt.Errorf("unknown thermo model %q; expected one of: %s", raw, KnownList()) + } +} + +// Known returns all reserved mode names in rollout order. +func Known() []Mode { + return []Mode{LegacyHeuristic, NNDuplexV1, NNStructureV1} +} + +// KnownList returns all reserved mode names as CLI help text. +func KnownList() string { + modes := Known() + parts := make([]string, 0, len(modes)) + for _, mode := range modes { + parts = append(parts, mode.String()) + } + return strings.Join(parts, " | ") +} + +func (m Mode) String() string { return string(m) } + +// Implemented reports whether the mode is executable in this patch. +func (m Mode) Implemented() bool { + return m == LegacyHeuristic || m == NNDuplexV1 || m == NNStructureV1 +} diff --git a/internal/thermomodel/model_test.go b/internal/thermomodel/model_test.go new file mode 100644 index 0000000..5426375 --- /dev/null +++ b/internal/thermomodel/model_test.go @@ -0,0 +1,39 @@ +package thermomodel + +import "testing" + +func TestParseDefaultsToLegacyHeuristic(t *testing.T) { + got, err := Parse("") + if err != nil { + t.Fatalf("Parse returned error: %v", err) + } + if got != LegacyHeuristic { + t.Fatalf("got %q, want %q", got, LegacyHeuristic) + } +} + +func TestParseKnownModes(t *testing.T) { + for _, mode := range Known() { + got, err := Parse(mode.String()) + if err != nil { + t.Fatalf("Parse(%q): %v", mode, err) + } + if got != mode { + t.Fatalf("Parse(%q) = %q", mode, got) + } + } +} + +func TestImplementedModes(t *testing.T) { + for _, mode := range []Mode{LegacyHeuristic, NNDuplexV1, NNStructureV1} { + if !mode.Implemented() { + t.Fatalf("%q should be implemented", mode) + } + } +} + +func TestParseRejectsUnknownMode(t *testing.T) { + if _, err := Parse("bogus"); err == nil { + t.Fatal("expected unknown mode error") + } +} diff --git a/internal/thermovisitors/denom_test.go b/internal/thermovisitors/denom_test.go index 77e4324..0cd46d9 100644 --- a/internal/thermovisitors/denom_test.go +++ b/internal/thermovisitors/denom_test.go @@ -41,3 +41,13 @@ func TestAutoDenomInfluencesPenalty(t *testing.T) { t.Fatalf("expected auto denom to change penalty: fixed=%.6f auto=%.6f (D=%.2f)", pFixed, pAuto, dAuto) } } + +func TestDenomForPrimerUsesCanonicalConditionsObject(t *testing.T) { + base := Score{Na_M: 0.05, PrimerConc_M: 2.5e-7} + fromLegacyFields := base.denomForPrimer("ACGTACGTAC") + + fromConditions := Score{Conditions: base.conditions()}.denomForPrimer("ACGTACGTAC") + if fromLegacyFields != fromConditions { + t.Fatalf("conditions object changed denom: legacy-fields=%g conditions=%g", fromLegacyFields, fromConditions) + } +} diff --git a/internal/thermovisitors/score.go b/internal/thermovisitors/score.go index e5f1790..18634ef 100644 --- a/internal/thermovisitors/score.go +++ b/internal/thermovisitors/score.go @@ -2,9 +2,12 @@ package thermovisitors import ( + "fmt" "ipcr-core/engine" + probeanno "ipcr-core/probe" "ipcr-core/thermo" "ipcr-core/thermoaddons" + "ipcr/internal/thermomodel" "math" "os" "strings" @@ -18,23 +21,215 @@ const ( K5 = 3 // 5' end is harsher across first K5 bases K3 = 3 // 3' end is harshest across last K3 bases PROBE_NOT_FOUND_PEN = 12.0 + + iupacPolicyStrictACGT = "strict-acgt" + mismatchPolicyNNPerfect = thermo.MismatchPolicyPerfect + mismatchPolicyHeuristicFallback = thermo.MismatchPolicyImperfectHeuristicFallback + mismatchPolicyMixed = "nn-perfect-or-nn-imperfect-v1" + structurePolicyNNStemV1 = thermo.StructureModelContiguousStemV1 + structurePolicyNNStemLoopV2 = thermo.StructureModelStemLoopV2 + structurePolicyNNPartitionV1 = thermo.StructureModelPartitionV1 + + scoreProfileBinding = "binding" + scoreProfilePCR = "pcr" + scoreProfileGel = "gel" + + probeScoreModeAnnotate = "annotate" + probeScoreModeGate = "gate" + probeScoreModeBlend = "blend" + + defaultBandMassWeightC = 15.0 + bandMassRefBP = 100.0 ) +// PrimerRef identifies one primer in the current panel/pool for panel-wide +// dimer competition checks. Seq is expected in 5′→3′ orientation. +type PrimerRef struct { + ID string + Seq string +} + // Score is the thermo-scoring visitor config. type Score struct { + Model thermomodel.Mode + Conditions thermo.Conditions + AnnealTempC float64 Na_M float64 PrimerConc_M float64 AllowIndels bool LengthBiasOn bool SingleStranded bool // read (OR'd with env) to enable ssDNA tweaks + StructHairpin bool + StructDimer bool StructScale float64 + PanelPrimers []PrimerRef + + IUPACThermoPolicy string + IUPACThermoMaxExpansions int + + // ScoreProfile controls whether NN model scores remain pure primer-template + // binding margins or include PCR/gel-observable amplicon-level terms. + ScoreProfile string + ExtAlpha float64 + ExtWeight float64 + LenKneeBP int + LenSteep float64 + LenMaxPenC float64 + BindWeight float64 + BandMassWeight float64 + + ProbeSeq string + ProbeName string + ProbeMaxMM int + ProbeThermo bool + ProbeScoreMode string + ProbeMinMarginC float64 + ProbeWeight float64 // Opt-in: compute ΔΔG→ΔTm denominator from solution conditions. // Default false keeps the historical fixed D=200 path. UseAutoDenom bool } +func (v Score) conditions() thermo.Conditions { + c := v.Conditions + if c.AnnealC == 0 { + c.AnnealC = v.AnnealTempC + } + if c.NaM == 0 { + c.NaM = v.Na_M + } + if c.PrimerTotalM == 0 { + c.PrimerTotalM = v.PrimerConc_M + } + if c.SaltModel == "" { + c.SaltModel = thermo.SaltModelMonovalent + } + return c.WithDefaults() +} + +func (v Score) scoreProfile() string { + switch strings.ToLower(strings.TrimSpace(v.ScoreProfile)) { + case "", scoreProfileBinding: + return scoreProfileBinding + case scoreProfilePCR: + return scoreProfilePCR + case scoreProfileGel: + return scoreProfileGel + default: + return scoreProfileBinding + } +} + +func (v Score) iupacThermoPolicy() string { + policy, err := thermo.ParseIUPACThermoPolicy(v.IUPACThermoPolicy) + if err != nil { + return thermo.IUPACThermoPolicyWorst + } + return policy +} + +func (v Score) iupacThermoMaxExpansions() int { + if v.IUPACThermoMaxExpansions < 1 { + return 256 + } + return v.IUPACThermoMaxExpansions +} + +func (v Score) extAlpha() float64 { + if v.ExtAlpha == 0 { + return 0.45 + } + if v.ExtAlpha < 0 { + return 0 + } + return v.ExtAlpha +} + +func (v Score) extWeight() float64 { + if v.ExtWeight == 0 { + return 1 + } + return v.ExtWeight +} + +func (v Score) lenKneeBP() int { + if v.LenKneeBP <= 0 { + return 550 + } + return v.LenKneeBP +} + +func (v Score) lenSteep() float64 { + if v.LenSteep == 0 { + return 0.003 + } + if v.LenSteep < 0 { + return 0 + } + return v.LenSteep +} + +func (v Score) lenMaxPenC() float64 { + if v.LenMaxPenC == 0 { + return 10 + } + if v.LenMaxPenC < 0 { + return 0 + } + return v.LenMaxPenC +} + +func (v Score) bindWeight() float64 { + if v.BindWeight == 0 { + return 1 + } + return v.BindWeight +} + +func (v Score) bandMassWeight() float64 { + if v.BandMassWeight == 0 { + return defaultBandMassWeightC + } + return v.BandMassWeight +} + +func (v Score) probeName() string { + name := strings.TrimSpace(v.ProbeName) + if name == "" { + return "probe" + } + return name +} + +func (v Score) probeScoreMode() string { + switch strings.ToLower(strings.TrimSpace(v.ProbeScoreMode)) { + case probeScoreModeAnnotate: + return probeScoreModeAnnotate + case "", probeScoreModeGate: + return probeScoreModeGate + case probeScoreModeBlend: + return probeScoreModeBlend + default: + return probeScoreModeGate + } +} + +func (v Score) probeWeight() float64 { + if v.ProbeWeight < 0 { + return 0 + } + if v.ProbeWeight > 1 { + return 1 + } + return v.ProbeWeight +} + +func (v Score) probeThermoEnabled() bool { + return strings.TrimSpace(v.ProbeSeq) != "" && v.ProbeThermo +} + // Public helper used by tests/tools. func (v *Score) Penalty(primer5to3, tgt3to5 string, denom float64) float64 { ssOn := v.SingleStranded || singleStrandedMode() @@ -133,6 +328,16 @@ func comp5to3(top string) string { return string(out) } +func absFiniteOrFallback(x, fallback float64) float64 { + if math.IsNaN(x) || math.IsInf(x, 0) || x == 0 { + return fallback + } + if x < 0 { + return -x + } + return x +} + // Env-based ssDNA toggle (kept for backwards compatibility). func singleStrandedMode() bool { v := strings.TrimSpace(strings.ToLower(os.Getenv("IPCR_SINGLE_STRANDED"))) @@ -259,29 +464,24 @@ func alignPenaltyC_contextualD_ss(primer5to3, tgt3to5 string, allowGap bool, den // We return the absolute value so ΔΔG→ΔTm scaling is a positive magnitude. func (v Score) denomForPrimer(primer5to3 string) float64 { p := toUpperACGT(primer5to3) - if p == "" || v.Na_M <= 0 || v.PrimerConc_M <= 0 { + cond := v.conditions() + if p == "" || cond.NaM <= 0 || cond.PrimerTotalM <= 0 { return 200.0 } - // Build 3'→5' complement for Tm() + // Build 3'→5' complement for Tm(). t3 := comp5to3(p) - // Self-compl check: rc == p (5'→3') + // Self-compl check: rc == p (5'→3'). rc := rev(comp5to3(p)) - x := 4 - if rc == p { - x = 1 - } + cond.SelfComplementary = rc == p + in := cond.TmInput() - res, err := thermo.Tm(p, t3, thermo.TmInput{ - CT: v.PrimerConc_M, - Na: v.Na_M, - X: x, - }) + res, err := thermo.Tm(p, t3, in) if err != nil { return 200.0 } - D := res.DS_Na + thermo.Rcal*math.Log(v.PrimerConc_M/float64(x)) - // Go 1.22-safe "finite" check, then take magnitude + D := res.DS_Na + thermo.Rcal*math.Log(in.CT/float64(in.X)) + // Go 1.22-safe "finite" check, then take magnitude. if math.IsNaN(D) || math.IsInf(D, 0) || D == 0 { return 200.0 } @@ -291,10 +491,1035 @@ func (v Score) denomForPrimer(primer5to3 string) float64 { return D } +func appendUniqueString(dst []string, value string) []string { + value = strings.TrimSpace(value) + if value == "" { + return dst + } + for _, existing := range dst { + if existing == value { + return dst + } + } + return append(dst, value) +} + +func appendUniqueStrings(dst []string, values []string) []string { + for _, value := range values { + dst = appendUniqueString(dst, value) + } + return dst +} + +func mismatchProvenance(contribs []thermo.MismatchContribution) (sources, parameterSets, citations, notes []string) { + for _, c := range contribs { + sources = appendUniqueString(sources, string(c.Source)) + parameterSets = appendUniqueString(parameterSets, c.ParameterSet) + citations = appendUniqueString(citations, c.Citation) + notes = appendUniqueString(notes, c.ParameterNote) + } + return sources, parameterSets, citations, notes +} + +func terminalMismatchProvenance(contribs []thermo.MismatchContribution) (sources, parameterSets, citations, notes []string) { + for _, c := range contribs { + if c.TerminalPenaltyC == 0 && c.TerminalSource == "" && c.TerminalParameterSet == "" && c.TerminalCitation == "" && c.TerminalParameterNote == "" { + continue + } + sources = appendUniqueString(sources, c.TerminalSource) + parameterSets = appendUniqueString(parameterSets, c.TerminalParameterSet) + citations = appendUniqueString(citations, c.TerminalCitation) + notes = appendUniqueString(notes, c.TerminalParameterNote) + } + return sources, parameterSets, citations, notes +} + +func endpointFromDuplex(side string, d thermo.DuplexResult, mismatchPenaltyC float64, policy string, hasNonWC, heuristic bool) engine.ThermoEndpoint { + return engine.ThermoEndpoint{ + Side: side, + TmC: d.TmC, + AnnealMarginC: d.AnnealMarginC, + DeltaGAtAnnealKcal: d.DeltaGAtAnnealKcal, + MismatchPenaltyC: mismatchPenaltyC, + EffectiveDenomCalK: absFiniteOrFallback(d.EffectiveDenomCalK, 200.0), + MismatchPolicy: policy, + EndEffectPolicy: thermo.EndEffectPolicyNone, + HasNonWatsonCrick: hasNonWC, + UsedHeuristicAdjust: heuristic, + } +} + +func endpointFromImperfect(side string, d thermo.ImperfectDuplexResult) engine.ThermoEndpoint { + sources, parameterSets, citations, notes := mismatchProvenance(d.Contributions) + terminalSources, terminalParameterSets, terminalCitations, terminalNotes := terminalMismatchProvenance(d.Contributions) + return engine.ThermoEndpoint{ + Side: side, + TmC: d.TmC, + AnnealMarginC: d.AnnealMarginC, + DeltaGAtAnnealKcal: d.DeltaGAtAnnealKcal, + MismatchPenaltyC: d.MismatchPenaltyC, + MismatchDeltaGKcal: d.DeltaGPenaltyKcal, + TerminalMismatchPenaltyC: d.TerminalMismatchPenaltyC, + TerminalMismatchDeltaGKcal: d.TerminalMismatchDeltaGKcal, + DanglingEndAdjustmentC: d.DanglingEndAdjustmentC, + DanglingEndDeltaGKcal: d.DanglingEndDeltaGKcal, + DanglingEndCount: d.DanglingEndCount, + MismatchCount: d.MismatchCount, + FivePrimeMismatchCount: d.FivePrimeMismatchCount, + ThreePrimeMismatchCount: d.ThreePrimeMismatchCount, + FivePrimeTerminalMismatchCount: d.FivePrimeTerminalMismatchCount, + ThreePrimeTerminalMismatchCount: d.ThreePrimeTerminalMismatchCount, + TerminalMismatchCount: d.TerminalMismatchCount, + FivePrimeTerminalMismatchPenaltyC: d.FivePrimeTerminalMismatchPenaltyC, + ThreePrimeTerminalMismatchPenaltyC: d.ThreePrimeTerminalMismatchPenaltyC, + MismatchFallbackCount: d.HeuristicFallbackCount + d.DefaultFallbackCount, + MismatchTripletCount: d.TripletTmCount + d.TripletDeltaGCount, + MismatchCuratedPairCount: d.CuratedPairCount, + MismatchSources: sources, + MismatchParameterSets: parameterSets, + MismatchCitations: citations, + MismatchParameterNotes: notes, + TerminalMismatchSources: terminalSources, + TerminalMismatchParameterSets: terminalParameterSets, + TerminalMismatchCitations: terminalCitations, + TerminalMismatchParameterNotes: terminalNotes, + EffectiveDenomCalK: absFiniteOrFallback(d.EffectiveDenomCalK, 200.0), + MismatchPolicy: d.MismatchPolicy, + EndEffectPolicy: d.EndEffectPolicy, + HasNonWatsonCrick: d.HasNonWatsonCrick, + UsedHeuristicAdjust: d.UsedHeuristicAdjust, + } +} + +func (v Score) scoreNNDuplexEndpoint(side, primer5to3, target3to5 string, dangling thermo.DanglingEndContext) (engine.ThermoEndpoint, error) { + primer := toUpperACGT(primer5to3) + if primer == "" { + return engine.ThermoEndpoint{}, fmt.Errorf("nn-duplex-v1 requires A/C/G/T primers; %s primer contains unsupported bases", side) + } + target := toUpperACGTAllowN(target3to5) + if target == "" { + return engine.ThermoEndpoint{}, fmt.Errorf("nn-duplex-v1 requires A/C/G/T/N target sites; %s target contains unsupported bases", side) + } + if len(primer) != len(target) { + return engine.ThermoEndpoint{}, fmt.Errorf("nn-duplex-v1 %s endpoint length mismatch: primer=%d target=%d", side, len(primer), len(target)) + } + + cond := v.conditions() + ssOn := v.SingleStranded || singleStrandedMode() + if !v.AllowIndels && !ssOn { + imperfect, err := thermo.ImperfectDuplexWithOptionsAndContext(primer, target, cond, thermo.DefaultImperfectDuplexOptions(), dangling) + if err != nil { + return engine.ThermoEndpoint{}, err + } + return endpointFromImperfect(side, imperfect), nil + } + + // Gap-tolerant and ssDNA adjustments are not yet part of the NN imperfect + // duplex core. Preserve the historical DP fallback for those opt-in modes and + // label it explicitly. + perfectTarget := comp5to3(primer) + base, err := thermo.PerfectDuplex(primer, perfectTarget, cond) + if err != nil { + return engine.ThermoEndpoint{}, err + } + denom := absFiniteOrFallback(base.EffectiveDenomCalK, 200.0) + penaltyC := alignPenaltyC_contextualD_ss(primer, target, v.AllowIndels, denom, ssOn) + deltaGPenalty := penaltyC * denom / 1000.0 + + adjusted := base + adjusted.TmC = base.TmC - penaltyC + adjusted.AnnealMarginC = adjusted.TmC - cond.AnnealC + adjusted.DeltaGAtAnnealKcal = base.DeltaGAtAnnealKcal + deltaGPenalty + adjusted.EffectiveDenomCalK = denom + return endpointFromDuplex(side, adjusted, penaltyC, mismatchPolicyHeuristicFallback, true, true), nil +} + +func (v Score) scoreNNDuplexComponents(p engine.Product) (engine.ThermoEndpoint, engine.ThermoEndpoint, float64, string, thermo.Conditions, error) { + f := toUpperACGT(p.FwdPrimer) + r := toUpperACGT(p.RevPrimer) + if f == "" || r == "" { + return engine.ThermoEndpoint{}, engine.ThermoEndpoint{}, 0, "", thermo.Conditions{}, fmt.Errorf("nn-duplex-v1 requires A/C/G/T primers; degenerate/IUPAC primer scoring is not implemented yet") + } + if len(p.Seq) < len(f) || len(p.Seq) < len(r) { + return engine.ThermoEndpoint{}, engine.ThermoEndpoint{}, 0, "", thermo.Conditions{}, fmt.Errorf("nn-duplex-v1 requires product sequence long enough for both primer sites") + } + + leftSite := toUpperACGTAllowN(p.Seq[:len(f)]) + rightSite := toUpperACGTAllowN(p.Seq[len(p.Seq)-len(r):]) + if leftSite == "" || rightSite == "" { + return engine.ThermoEndpoint{}, engine.ThermoEndpoint{}, 0, "", thermo.Conditions{}, fmt.Errorf("nn-duplex-v1 requires A/C/G/T/N product sequence at primer sites") + } + + // The left site is in the same 5'→3' orientation as the forward primer. + // The right site is the reference-strand reverse complement of the reverse + // primer, so reversing the site gives the primer-aligned target strand 3'→5'. + fwdTarget3 := comp5to3(leftSite) + revTarget3 := rev(rightSite) + fwdDangling := thermo.DanglingEndContext{} + revDangling := thermo.DanglingEndContext{} + if len(p.Seq) > len(f) { + // Forward primer binds the bottom/complement strand; the amplicon-interior + // base after the forward site is converted to the aligned template base. + fwdDangling.ThreePrimeBase = compBase(byte(unicode.ToUpper(rune(p.Seq[len(f)])))) + } + if len(p.Seq) > len(r) { + idx := len(p.Seq) - len(r) - 1 + if idx >= 0 { + // Reverse primer binds the top/reference strand; the amplicon-interior + // base before the reverse site is already the aligned template base. + revDangling.ThreePrimeBase = byte(unicode.ToUpper(rune(p.Seq[idx]))) + } + } + + fwd, err := v.scoreNNDuplexEndpoint("fwd", f, fwdTarget3, fwdDangling) + if err != nil { + return engine.ThermoEndpoint{}, engine.ThermoEndpoint{}, 0, "", thermo.Conditions{}, err + } + revEnd, err := v.scoreNNDuplexEndpoint("rev", r, revTarget3, revDangling) + if err != nil { + return engine.ThermoEndpoint{}, engine.ThermoEndpoint{}, 0, "", thermo.Conditions{}, err + } + + limitingSide := "fwd" + score := fwd.AnnealMarginC + if revEnd.AnnealMarginC < score { + score = revEnd.AnnealMarginC + limitingSide = "rev" + } + return fwd, revEnd, score, limitingSide, v.conditions(), nil +} + +func nnThermoDetails(model thermomodel.Mode, cond thermo.Conditions, fwd, revEnd engine.ThermoEndpoint, score float64, limitingSide string) *engine.ThermoDetails { + return &engine.ThermoDetails{ + Model: model.String(), + SaltModel: cond.SaltModel.String(), + NaM: cond.NaM, + MgM: cond.MgM, + DntpM: cond.DntpM, + EffectiveNaM: cond.EffectiveNaM(), + FreeMgM: cond.FreeMgM(), + AnnealTempC: cond.AnnealC, + IUPACPolicy: iupacPolicyStrictACGT, + MismatchPolicy: mismatchPolicyMixed, + ScoreProfile: scoreProfileBinding, + ScoreC: score, + BaseScoreC: score, + LimitingSide: limitingSide, + Fwd: fwd, + Rev: revEnd, + } +} + +func ampliconBandMassBonusC(bp int, weightC float64) float64 { + if bp <= 0 || weightC == 0 { + return 0 + } + ratio := float64(bp) / bandMassRefBP + if ratio <= 0 { + return 0 + } + return weightC * math.Log2(ratio) +} + +func (v Score) applyAmpliconProfile(p engine.Product, details *engine.ThermoDetails, score float64) float64 { + if details == nil { + return score + } + profile := v.scoreProfile() + details.ScoreProfile = profile + if profile == scoreProfileBinding { + details.ScoreC = score + return score + } + + limitingMargin := details.Fwd.AnnealMarginC + if details.Rev.AnnealMarginC < limitingMargin { + limitingMargin = details.Rev.AnnealMarginC + } + + bindingAdjustment := score * (v.bindWeight() - 1) + + extProb := thermoaddons.ExtensionProb(limitingMargin, v.extAlpha()) + extLogit := thermoaddons.Logit(extProb) + extBonus := v.extWeight() * extLogit + lengthPenalty := thermoaddons.LengthPenalty(p.Length, v.lenKneeBP(), v.lenSteep(), v.lenMaxPenC()) + + details.ExtensionLogit = extLogit + details.ExtensionBonusC = extBonus + details.LengthPenaltyC = lengthPenalty + + adjustment := bindingAdjustment + extBonus - lengthPenalty + if profile == scoreProfileGel { + bandBonus := ampliconBandMassBonusC(p.Length, v.bandMassWeight()) + details.BandMassBonusC = bandBonus + adjustment += bandBonus + } + details.AmpliconAdjustmentC = adjustment + score += adjustment + details.ScoreC = score + return score +} + +type primerPairVariant struct { + Fwd string + Rev string +} + +type nnVariantScorer func(engine.Product) (bool, engine.Product, error) + +func (v Score) primerPairVariants(p engine.Product) ([]primerPairVariant, bool, error) { + policy := v.iupacThermoPolicy() + if policy == thermo.IUPACThermoPolicyStrict { + f := toUpperACGT(p.FwdPrimer) + r := toUpperACGT(p.RevPrimer) + if f == "" || r == "" { + return nil, false, fmt.Errorf("NN thermodynamics with --iupac-thermo-policy strict requires A/C/G/T primers") + } + return []primerPairVariant{{Fwd: f, Rev: r}}, false, nil + } + + maxExp := v.iupacThermoMaxExpansions() + fwdExp, fwdCapped, err := thermo.ExpandIUPAC(p.FwdPrimer, maxExp) + if err != nil { + return nil, false, fmt.Errorf("forward primer IUPAC expansion: %w", err) + } + revExp, revCapped, err := thermo.ExpandIUPAC(p.RevPrimer, maxExp) + if err != nil { + return nil, false, fmt.Errorf("reverse primer IUPAC expansion: %w", err) + } + out := make([]primerPairVariant, 0, minInt(maxExp, len(fwdExp)*len(revExp))) + capped := fwdCapped || revCapped + for _, f := range fwdExp { + for _, r := range revExp { + if len(out) >= maxExp { + return out, true, nil + } + out = append(out, primerPairVariant{Fwd: f, Rev: r}) + } + } + return out, capped, nil +} + +func minInt(a, b int) int { + if a < b { + return a + } + return b +} + +func iupacVariantLabel(fwd, rev string) string { + return "fwd=" + fwd + ";rev=" + rev +} + +func thermoVariantSummary(p engine.Product) engine.ThermoVariant { + out := engine.ThermoVariant{ + FwdPrimer: p.FwdPrimer, + RevPrimer: p.RevPrimer, + ScoreC: p.Score, + } + if p.Thermo != nil { + out.ScoreC = p.Thermo.ScoreC + out.BaseScoreC = p.Thermo.BaseScoreC + out.StructurePenaltyC = p.Thermo.StructurePenaltyC + out.LimitingSide = p.Thermo.LimitingSide + out.FwdTmC = p.Thermo.Fwd.TmC + out.RevTmC = p.Thermo.Rev.TmC + out.FwdMarginC = p.Thermo.Fwd.AnnealMarginC + out.RevMarginC = p.Thermo.Rev.AnnealMarginC + } + return out +} + +func copyThermoDetails(src *engine.ThermoDetails) *engine.ThermoDetails { + if src == nil { + return nil + } + out := *src + if src.IUPACVariants != nil { + out.IUPACVariants = append([]engine.ThermoVariant(nil), src.IUPACVariants...) + } + return &out +} + +func averageIUPACProducts(scored []engine.Product) engine.Product { + out := scored[0] + out.Thermo = copyThermoDetails(scored[0].Thermo) + n := float64(len(scored)) + out.Score = 0 + if out.Thermo != nil { + out.Thermo.ScoreC = 0 + out.Thermo.BaseScoreC = 0 + out.Thermo.AmpliconAdjustmentC = 0 + out.Thermo.ExtensionLogit = 0 + out.Thermo.ExtensionBonusC = 0 + out.Thermo.LengthPenaltyC = 0 + out.Thermo.BandMassBonusC = 0 + out.Thermo.StructurePenaltyC = 0 + out.Thermo.Fwd.TmC = 0 + out.Thermo.Rev.TmC = 0 + out.Thermo.Fwd.AnnealMarginC = 0 + out.Thermo.Rev.AnnealMarginC = 0 + out.Thermo.Fwd.DeltaGAtAnnealKcal = 0 + out.Thermo.Rev.DeltaGAtAnnealKcal = 0 + } + for _, p := range scored { + out.Score += p.Score + if out.Thermo != nil && p.Thermo != nil { + out.Thermo.ScoreC += p.Thermo.ScoreC + out.Thermo.BaseScoreC += p.Thermo.BaseScoreC + out.Thermo.AmpliconAdjustmentC += p.Thermo.AmpliconAdjustmentC + out.Thermo.ExtensionLogit += p.Thermo.ExtensionLogit + out.Thermo.ExtensionBonusC += p.Thermo.ExtensionBonusC + out.Thermo.LengthPenaltyC += p.Thermo.LengthPenaltyC + out.Thermo.BandMassBonusC += p.Thermo.BandMassBonusC + out.Thermo.StructurePenaltyC += p.Thermo.StructurePenaltyC + out.Thermo.Fwd.TmC += p.Thermo.Fwd.TmC + out.Thermo.Rev.TmC += p.Thermo.Rev.TmC + out.Thermo.Fwd.AnnealMarginC += p.Thermo.Fwd.AnnealMarginC + out.Thermo.Rev.AnnealMarginC += p.Thermo.Rev.AnnealMarginC + out.Thermo.Fwd.DeltaGAtAnnealKcal += p.Thermo.Fwd.DeltaGAtAnnealKcal + out.Thermo.Rev.DeltaGAtAnnealKcal += p.Thermo.Rev.DeltaGAtAnnealKcal + } + } + out.Score /= n + if out.Thermo != nil { + out.Thermo.ScoreC /= n + out.Thermo.BaseScoreC /= n + out.Thermo.AmpliconAdjustmentC /= n + out.Thermo.ExtensionLogit /= n + out.Thermo.ExtensionBonusC /= n + out.Thermo.LengthPenaltyC /= n + out.Thermo.BandMassBonusC /= n + out.Thermo.StructurePenaltyC /= n + out.Thermo.Fwd.TmC /= n + out.Thermo.Rev.TmC /= n + out.Thermo.Fwd.AnnealMarginC /= n + out.Thermo.Rev.AnnealMarginC /= n + out.Thermo.Fwd.DeltaGAtAnnealKcal /= n + out.Thermo.Rev.DeltaGAtAnnealKcal /= n + out.Thermo.LimitingSide = "mean" + } + return out +} + +func annotateIUPACThermo(p *engine.Product, policy string, count int, capped bool, effective string, variants []engine.ThermoVariant) { + if p.Thermo == nil { + return + } + p.Thermo.IUPACThermoPolicy = policy + p.Thermo.IUPACExpansionCount = count + p.Thermo.IUPACExpansionCapped = capped + p.Thermo.IUPACEffectiveVariant = effective + p.Thermo.IUPACPolicy = "iupac-thermo-" + policy + if policy == thermo.IUPACThermoPolicyEnumerate { + p.Thermo.IUPACVariants = variants + } +} + +func (v Score) visitNNWithIUPAC(p engine.Product, scorer nnVariantScorer) (bool, engine.Product, error) { + variants, capped, err := v.primerPairVariants(p) + if err != nil { + return false, p, err + } + if len(variants) == 0 { + return false, p, fmt.Errorf("IUPAC thermo expansion produced no concrete primer variants") + } + policy := v.iupacThermoPolicy() + scored := make([]engine.Product, 0, len(variants)) + summaries := make([]engine.ThermoVariant, 0, len(variants)) + for _, variant := range variants { + q := p + q.FwdPrimer = variant.Fwd + q.RevPrimer = variant.Rev + ok, got, err := scorer(q) + if err != nil { + return false, p, err + } + if !ok { + continue + } + scored = append(scored, got) + summaries = append(summaries, thermoVariantSummary(got)) + } + if len(scored) == 0 { + return false, p, nil + } + + bestIdx := 0 + switch policy { + case thermo.IUPACThermoPolicyBest: + for i := 1; i < len(scored); i++ { + if scored[i].Score > scored[bestIdx].Score { + bestIdx = i + } + } + out := scored[bestIdx] + annotateIUPACThermo(&out, policy, len(scored), capped, iupacVariantLabel(out.FwdPrimer, out.RevPrimer), summaries) + return true, out, nil + case thermo.IUPACThermoPolicyMean, thermo.IUPACThermoPolicyEnumerate: + out := averageIUPACProducts(scored) + effective := "mean" + if policy == thermo.IUPACThermoPolicyEnumerate { + effective = "enumerate" + } + annotateIUPACThermo(&out, policy, len(scored), capped, effective, summaries) + return true, out, nil + default: + // worst is the default and the most conservative assay-design behavior. + for i := 1; i < len(scored); i++ { + if scored[i].Score < scored[bestIdx].Score { + bestIdx = i + } + } + out := scored[bestIdx] + annotateIUPACThermo(&out, policy, len(scored), capped, iupacVariantLabel(out.FwdPrimer, out.RevPrimer), summaries) + return true, out, nil + } +} + +func (v Score) visitNNDuplexV1Strict(p engine.Product) (bool, engine.Product, error) { + fwd, revEnd, score, limitingSide, cond, err := v.scoreNNDuplexComponents(p) + if err != nil { + return false, p, err + } + details := nnThermoDetails(thermomodel.NNDuplexV1, cond, fwd, revEnd, score, limitingSide) + score = v.applyAmpliconProfile(p, details, score) + p.Score = score + p.Thermo = details + return true, p, nil +} + +func (v Score) visitNNDuplexV1(p engine.Product) (bool, engine.Product, error) { + return v.visitNNWithIUPAC(p, v.visitNNDuplexV1Strict) +} + +func structureFromResult(src thermo.StructureResult, penaltyC float64) *engine.ThermoStructure { + return structureFromResultWithLabels(src, penaltyC, "", "") +} + +func structureFromResultWithLabels(src thermo.StructureResult, penaltyC float64, queryA, queryB string) *engine.ThermoStructure { + if src.StemLen == 0 { + return nil + } + return &engine.ThermoStructure{ + Kind: src.Kind, + Model: src.Model, + QueryA: queryA, + QueryB: queryB, + DeltaGAtAnnealKcal: src.DeltaGAtAnnealKcal, + TmC: src.TmC, + AnnealMarginC: src.AnnealMarginC, + StemLen: src.StemLen, + LoopLen: src.LoopLen, + AStart: src.AStart, + AEnd: src.AEnd, + BStart: src.BStart, + BEnd: src.BEnd, + ThreePrimeAnchored: src.ThreePrimeAnchored, + BothThreePrimeAnchor: src.BothThreePrimeAnchor, + SegmentCount: src.SegmentCount, + BulgeCount: src.BulgeCount, + InternalLoopCount: src.InternalLoopCount, + DanglingEndCount: src.DanglingEndCount, + LoopPenaltyKcal: src.LoopPenaltyKcal, + BulgePenaltyKcal: src.BulgePenaltyKcal, + InternalLoopPenaltyKcal: src.InternalLoopPenaltyKcal, + StructureDanglingDeltaGKcal: src.DanglingAdjustmentKcal, + EnsembleDeltaGAtAnnealKcal: src.EnsembleDeltaGAtAnnealKcal, + PartitionFunction: src.PartitionFunction, + EnsembleWeight: src.EnsembleWeight, + EnsembleCandidateCount: src.EnsembleCandidateCount, + DPCellCount: src.DPCellCount, + DPStateCount: src.DPStateCount, + DPExpectedPairs: src.DPExpectedPairs, + DPMFEDeltaGAtAnnealKcal: src.DPMFEDeltaGAtAnnealKcal, + DPEnsembleDeltaGAtAnnealKcal: src.DPEnsembleDeltaGAtAnnealKcal, + PenaltyC: penaltyC, + } +} + +func structureCompetitionPenaltyC(src thermo.StructureResult, binding engine.ThermoEndpoint) float64 { + if src.StemLen == 0 || math.IsNaN(src.DeltaGAtAnnealKcal) || math.IsInf(src.DeltaGAtAnnealKcal, 0) { + return 0 + } + // Positive when the structure is close enough to compete with the relevant + // primer-template endpoint at annealing temperature. 3' anchored dimers get a + // larger competition window because they can seed extension. + windowKcal := 1.0 + if src.Kind != thermo.StructureHairpin && src.ThreePrimeAnchored { + windowKcal = 2.0 + } + if src.BothThreePrimeAnchor { + windowKcal = 3.0 + } + competitiveKcal := binding.DeltaGAtAnnealKcal - src.DeltaGAtAnnealKcal + windowKcal + if competitiveKcal <= 0 { + return 0 + } + denom := absFiniteOrFallback(binding.EffectiveDenomCalK, 200.0) + penalty := competitiveKcal * 1000.0 / denom + if math.IsNaN(penalty) || math.IsInf(penalty, 0) || penalty < 0 { + return 0 + } + if penalty > 30 { + return 30 + } + return penalty +} + +func chooseWorseStructure(cur, cand *engine.ThermoStructure) *engine.ThermoStructure { + if cand == nil || cand.PenaltyC <= 0 { + return cur + } + if cur == nil || cand.PenaltyC > cur.PenaltyC { + return cand + } + if cand.PenaltyC == cur.PenaltyC && cand.DeltaGAtAnnealKcal < cur.DeltaGAtAnnealKcal { + return cand + } + return cur +} + +func (v Score) normalizePanelPrimers() []PrimerRef { + out := make([]PrimerRef, 0, len(v.PanelPrimers)) + seen := map[string]struct{}{} + maxExp := v.iupacThermoMaxExpansions() + for _, ref := range v.PanelPrimers { + id := strings.TrimSpace(ref.ID) + if id == "" { + id = strings.ToUpper(ref.Seq) + } + expanded, _, err := thermo.ExpandIUPAC(ref.Seq, maxExp) + if err != nil { + continue + } + for _, seq := range expanded { + key := seq + if _, ok := seen[key]; ok { + continue + } + seen[key] = struct{}{} + label := id + if len(expanded) > 1 { + label = id + "[" + seq + "]" + } + out = append(out, PrimerRef{ID: label, Seq: seq}) + } + } + return out +} + +func samePrimerSeq(a, b string) bool { + a = toUpperACGT(a) + b = toUpperACGT(b) + return a != "" && a == b +} + +type panelCrossDimerHit struct { + Result thermo.StructureResult + PenaltyC float64 + BurdenC float64 + Count int + QueryID string + PartnerID string +} + +func (v Score) bestPanelCrossDimer(fwdPrimer, revPrimer string, fwd, revEnd engine.ThermoEndpoint, cond thermo.Conditions) panelCrossDimerHit { + if len(v.PanelPrimers) == 0 { + return panelCrossDimerHit{} + } + queries := []struct { + ID string + Seq string + Binding engine.ThermoEndpoint + }{ + {ID: "fwd", Seq: fwdPrimer, Binding: fwd}, + {ID: "rev", Seq: revPrimer, Binding: revEnd}, + } + panel := v.normalizePanelPrimers() + seen := map[string]struct{}{} + var best panelCrossDimerHit + for _, q := range queries { + qSeq := toUpperACGT(q.Seq) + if qSeq == "" { + continue + } + for _, partner := range panel { + if samePrimerSeq(partner.Seq, fwdPrimer) || samePrimerSeq(partner.Seq, revPrimer) { + continue + } + key := q.ID + "\x00" + qSeq + "\x00" + partner.ID + "\x00" + partner.Seq + if _, ok := seen[key]; ok { + continue + } + seen[key] = struct{}{} + res, ok, err := thermo.BestCrossDimerPartition(qSeq, partner.Seq, thermo.DefaultStructureOptions(cond)) + if err != nil || !ok { + continue + } + pen := structureCompetitionPenaltyC(res, q.Binding) + if pen <= 0 { + continue + } + best.Count++ + best.BurdenC += pen + if pen > best.PenaltyC || (pen == best.PenaltyC && res.DeltaGAtAnnealKcal < best.Result.DeltaGAtAnnealKcal) { + best.Result = res + best.PenaltyC = pen + best.QueryID = q.ID + best.PartnerID = partner.ID + } + } + } + return best +} + +func (v Score) visitNNStructureV1Strict(p engine.Product) (bool, engine.Product, error) { + fwd, revEnd, baseScore, limitingSide, cond, err := v.scoreNNDuplexComponents(p) + if err != nil { + return false, p, err + } + + f := toUpperACGT(p.FwdPrimer) + r := toUpperACGT(p.RevPrimer) + scale := v.StructScale + if scale < 0 { + scale = 0 + } + + details := nnThermoDetails(thermomodel.NNStructureV1, cond, fwd, revEnd, baseScore, limitingSide) + details.StructurePolicy = structurePolicyNNPartitionV1 + details.BaseScoreC = baseScore + + totalPenalty := 0.0 + if v.StructHairpin { + if hp, ok, err := thermo.BestHairpinPartition(f, thermo.DefaultStructureOptions(cond)); err == nil && ok { + pen := structureCompetitionPenaltyC(hp, fwd) + details.WorstHairpin = chooseWorseStructure(details.WorstHairpin, structureFromResultWithLabels(hp, pen, "fwd", "fwd")) + totalPenalty += pen + } + if hp, ok, err := thermo.BestHairpinPartition(r, thermo.DefaultStructureOptions(cond)); err == nil && ok { + pen := structureCompetitionPenaltyC(hp, revEnd) + details.WorstHairpin = chooseWorseStructure(details.WorstHairpin, structureFromResultWithLabels(hp, pen, "rev", "rev")) + totalPenalty += pen + } + } + + if v.StructDimer { + if sd, ok, err := thermo.BestSelfDimerPartition(f, thermo.DefaultStructureOptions(cond)); err == nil && ok { + pen := structureCompetitionPenaltyC(sd, fwd) + details.WorstSelfDimer = chooseWorseStructure(details.WorstSelfDimer, structureFromResultWithLabels(sd, pen, "fwd", "fwd")) + totalPenalty += pen + } + if sd, ok, err := thermo.BestSelfDimerPartition(r, thermo.DefaultStructureOptions(cond)); err == nil && ok { + pen := structureCompetitionPenaltyC(sd, revEnd) + details.WorstSelfDimer = chooseWorseStructure(details.WorstSelfDimer, structureFromResultWithLabels(sd, pen, "rev", "rev")) + totalPenalty += pen + } + if xd, ok, err := thermo.BestCrossDimerPartition(f, r, thermo.DefaultStructureOptions(cond)); err == nil && ok { + pen := math.Max(structureCompetitionPenaltyC(xd, fwd), structureCompetitionPenaltyC(xd, revEnd)) + details.CrossDimer = chooseWorseStructure(details.CrossDimer, structureFromResultWithLabels(xd, pen, "fwd", "rev")) + totalPenalty += pen + } + panel := v.bestPanelCrossDimer(f, r, fwd, revEnd, cond) + if panel.PenaltyC > 0 { + details.PanelCrossDimer = structureFromResultWithLabels(panel.Result, panel.PenaltyC, panel.QueryID, panel.PartnerID) + details.PanelCrossDimerPenaltyC = panel.PenaltyC + details.PanelCrossDimerBurdenC = panel.BurdenC + details.PanelCrossDimerCount = panel.Count + totalPenalty += panel.PenaltyC + } + } + + totalPenalty *= scale + if details.WorstHairpin != nil { + details.WorstHairpin.PenaltyC *= scale + } + if details.WorstSelfDimer != nil { + details.WorstSelfDimer.PenaltyC *= scale + } + if details.CrossDimer != nil { + details.CrossDimer.PenaltyC *= scale + } + if details.PanelCrossDimer != nil { + details.PanelCrossDimer.PenaltyC *= scale + details.PanelCrossDimerPenaltyC *= scale + details.PanelCrossDimerBurdenC *= scale + } + + score := baseScore - totalPenalty + details.StructurePenaltyC = totalPenalty + score = v.applyAmpliconProfile(p, details, score) + p.Score = score + p.Thermo = details + return true, p, nil +} + +func (v Score) visitNNStructureV1(p engine.Product) (bool, engine.Product, error) { + return v.visitNNWithIUPAC(p, v.visitNNStructureV1Strict) +} + +type scoredProbeVariant struct { + Variant string + Result thermo.ImperfectDuplexResult +} + +func probeTarget3to5(strand, site string) string { + site = strings.ToUpper(site) + if strand == "-" { + return rev(site) + } + return comp5to3(site) +} + +func probeVariantDetails(base engine.ProbeThermoDetails, chosen scoredProbeVariant, count int, capped bool, effective string) engine.ProbeThermoDetails { + res := chosen.Result + base.IUPACExpansionCount = count + base.IUPACExpansionCapped = capped + base.IUPACEffectiveVariant = effective + base.TmC = res.TmC + base.AnnealMarginC = res.AnnealMarginC + base.DeltaGAtAnnealKcal = res.DeltaGAtAnnealKcal + base.MismatchPenaltyC = res.MismatchPenaltyC + base.MismatchDeltaGKcal = res.DeltaGPenaltyKcal + base.MismatchCount = res.MismatchCount + base.MismatchFallbackCount = res.HeuristicFallbackCount + res.DefaultFallbackCount + base.MismatchTripletCount = res.TripletTmCount + res.TripletDeltaGCount + base.MismatchCuratedPairCount = res.CuratedPairCount + base.MismatchSources, base.MismatchParameterSets, base.MismatchCitations, base.MismatchParameterNotes = mismatchProvenance(res.Contributions) + base.TerminalMismatchPenaltyC = res.TerminalMismatchPenaltyC + base.TerminalMismatchDeltaGKcal = res.TerminalMismatchDeltaGKcal + base.TerminalMismatchCount = res.TerminalMismatchCount + base.FivePrimeTerminalMismatchCount = res.FivePrimeTerminalMismatchCount + base.ThreePrimeTerminalMismatchCount = res.ThreePrimeTerminalMismatchCount + base.TerminalMismatchSources, base.TerminalMismatchParameterSets, base.TerminalMismatchCitations, base.TerminalMismatchParameterNotes = terminalMismatchProvenance(res.Contributions) + base.MismatchPolicy = res.MismatchPolicy + base.HasNonWatsonCrick = res.HasNonWatsonCrick + base.UsedHeuristicAdjust = res.UsedHeuristicAdjust + return base +} + +func meanProbeDetails(base engine.ProbeThermoDetails, scored []scoredProbeVariant, capped bool, effective string) engine.ProbeThermoDetails { + base.IUPACExpansionCount = len(scored) + base.IUPACExpansionCapped = capped + base.IUPACEffectiveVariant = effective + if len(scored) == 0 { + return base + } + n := float64(len(scored)) + for _, s := range scored { + res := s.Result + base.TmC += res.TmC + base.AnnealMarginC += res.AnnealMarginC + base.DeltaGAtAnnealKcal += res.DeltaGAtAnnealKcal + base.MismatchPenaltyC += res.MismatchPenaltyC + base.MismatchDeltaGKcal += res.DeltaGPenaltyKcal + base.TerminalMismatchPenaltyC += res.TerminalMismatchPenaltyC + base.TerminalMismatchDeltaGKcal += res.TerminalMismatchDeltaGKcal + if res.TerminalMismatchCount > base.TerminalMismatchCount { + base.TerminalMismatchCount = res.TerminalMismatchCount + } + if res.FivePrimeTerminalMismatchCount > base.FivePrimeTerminalMismatchCount { + base.FivePrimeTerminalMismatchCount = res.FivePrimeTerminalMismatchCount + } + if res.ThreePrimeTerminalMismatchCount > base.ThreePrimeTerminalMismatchCount { + base.ThreePrimeTerminalMismatchCount = res.ThreePrimeTerminalMismatchCount + } + if res.MismatchCount > base.MismatchCount { + base.MismatchCount = res.MismatchCount + } + fallbacks := res.HeuristicFallbackCount + res.DefaultFallbackCount + if fallbacks > base.MismatchFallbackCount { + base.MismatchFallbackCount = fallbacks + } + triplets := res.TripletTmCount + res.TripletDeltaGCount + if triplets > base.MismatchTripletCount { + base.MismatchTripletCount = triplets + } + if res.CuratedPairCount > base.MismatchCuratedPairCount { + base.MismatchCuratedPairCount = res.CuratedPairCount + } + sources, parameterSets, citations, notes := mismatchProvenance(res.Contributions) + base.MismatchSources = appendUniqueStrings(base.MismatchSources, sources) + base.MismatchParameterSets = appendUniqueStrings(base.MismatchParameterSets, parameterSets) + base.MismatchCitations = appendUniqueStrings(base.MismatchCitations, citations) + base.MismatchParameterNotes = appendUniqueStrings(base.MismatchParameterNotes, notes) + terminalSources, terminalParameterSets, terminalCitations, terminalNotes := terminalMismatchProvenance(res.Contributions) + base.TerminalMismatchSources = appendUniqueStrings(base.TerminalMismatchSources, terminalSources) + base.TerminalMismatchParameterSets = appendUniqueStrings(base.TerminalMismatchParameterSets, terminalParameterSets) + base.TerminalMismatchCitations = appendUniqueStrings(base.TerminalMismatchCitations, terminalCitations) + base.TerminalMismatchParameterNotes = appendUniqueStrings(base.TerminalMismatchParameterNotes, terminalNotes) + if res.HasNonWatsonCrick { + base.HasNonWatsonCrick = true + } + if res.UsedHeuristicAdjust { + base.UsedHeuristicAdjust = true + } + } + base.TmC /= n + base.AnnealMarginC /= n + base.DeltaGAtAnnealKcal /= n + base.MismatchPenaltyC /= n + base.MismatchDeltaGKcal /= n + base.TerminalMismatchPenaltyC /= n + base.TerminalMismatchDeltaGKcal /= n + if base.MismatchFallbackCount > 0 { + base.MismatchPolicy = thermo.MismatchPolicyImperfectHeuristicFallback + } else if base.MismatchTripletCount > 0 { + base.MismatchPolicy = thermo.MismatchPolicyImperfectTriplet + } else if base.MismatchCuratedPairCount > 0 { + base.MismatchPolicy = thermo.MismatchPolicyImperfectCuratedPair + } else if base.MismatchCount > 0 { + base.MismatchPolicy = thermo.MismatchPolicyImperfectV1 + } else { + base.MismatchPolicy = thermo.MismatchPolicyPerfect + } + return base +} + +func (v Score) scoreProbeThermoDetails(p engine.Product) (engine.ProbeThermoDetails, error) { + probeSeq := strings.ToUpper(strings.TrimSpace(v.ProbeSeq)) + details := engine.ProbeThermoDetails{ + Name: v.probeName(), + Seq: probeSeq, + ScoreMode: v.probeScoreMode(), + MinMarginC: v.ProbeMinMarginC, + IUPACThermoPolicy: v.iupacThermoPolicy(), + } + ann := probeanno.AnnotateAmplicon(p.Seq, probeSeq, v.ProbeMaxMM) + details.Found = ann.Found + details.Strand = ann.Strand + details.Pos = ann.Pos + details.MM = ann.MM + details.Site = ann.Site + if !ann.Found { + return details, nil + } + + policy := v.iupacThermoPolicy() + expanded := []string{probeSeq} + capped := false + if policy == thermo.IUPACThermoPolicyStrict { + if !thermo.IsStrictACGT(probeSeq) { + return details, fmt.Errorf("--probe with --iupac-thermo-policy strict requires A/C/G/T probe sequence") + } + } else { + var err error + expanded, capped, err = thermo.ExpandIUPAC(probeSeq, v.iupacThermoMaxExpansions()) + if err != nil { + return details, fmt.Errorf("--probe %q: %v", probeSeq, err) + } + } + if len(expanded) == 0 { + return details, fmt.Errorf("--probe IUPAC expansion produced no concrete probe variants") + } + + target := probeTarget3to5(ann.Strand, ann.Site) + cond := v.conditions() + scored := make([]scoredProbeVariant, 0, len(expanded)) + for _, variant := range expanded { + res, err := thermo.ImperfectDuplexWithOptions(variant, target, cond, thermo.DefaultImperfectDuplexOptions()) + if err != nil { + return details, err + } + scored = append(scored, scoredProbeVariant{Variant: variant, Result: res}) + } + + bestIdx := 0 + switch policy { + case thermo.IUPACThermoPolicyBest: + for i := 1; i < len(scored); i++ { + if scored[i].Result.AnnealMarginC > scored[bestIdx].Result.AnnealMarginC { + bestIdx = i + } + } + return probeVariantDetails(details, scored[bestIdx], len(scored), capped, scored[bestIdx].Variant), nil + case thermo.IUPACThermoPolicyMean: + return meanProbeDetails(details, scored, capped, "mean"), nil + case thermo.IUPACThermoPolicyEnumerate: + return meanProbeDetails(details, scored, capped, "enumerate"), nil + default: + // worst is the default for assay-design conservatism. + for i := 1; i < len(scored); i++ { + if scored[i].Result.AnnealMarginC < scored[bestIdx].Result.AnnealMarginC { + bestIdx = i + } + } + return probeVariantDetails(details, scored[bestIdx], len(scored), capped, scored[bestIdx].Variant), nil + } +} + +func (v Score) applyProbeThermo(p engine.Product) (bool, engine.Product, error) { + if !v.probeThermoEnabled() || p.Thermo == nil { + return true, p, nil + } + details, err := v.scoreProbeThermoDetails(p) + if err != nil { + return false, p, err + } + + oldScore := p.Score + switch details.ScoreMode { + case probeScoreModeGate: + if !details.Found { + details.GatePenaltyC = PROBE_NOT_FOUND_PEN + p.Thermo.Probe = &details + return false, p, nil + } + if details.AnnealMarginC < details.MinMarginC { + details.GatePenaltyC = details.MinMarginC - details.AnnealMarginC + p.Thermo.Probe = &details + return false, p, nil + } + case probeScoreModeBlend: + if !details.Found { + p.Score -= PROBE_NOT_FOUND_PEN + } else { + w := v.probeWeight() + limiting := math.Min(oldScore, details.AnnealMarginC) + p.Score = (1-w)*oldScore + w*limiting + } + details.ScoreContributionC = p.Score - oldScore + case probeScoreModeAnnotate: + // Deliberately leave the primer-derived score untouched. + } + if p.Thermo != nil { + p.Thermo.Probe = &details + p.Thermo.ScoreC = p.Score + } + return true, p, nil +} + // Visit implements the appcore visitor for ipcr-thermo. // It computes a small penalty for the forward end (and conservatively for the reverse end), // then sets Score = -penalty so that higher is better. func (v Score) Visit(p engine.Product) (bool, engine.Product, error) { + mode := v.Model + if mode == "" { + mode = thermomodel.Default() + } + var ( + ok bool + out engine.Product + err error + ) + switch mode { + case thermomodel.LegacyHeuristic: + return v.visitLegacyHeuristic(p) + case thermomodel.NNDuplexV1: + ok, out, err = v.visitNNDuplexV1(p) + case thermomodel.NNStructureV1: + ok, out, err = v.visitNNStructureV1(p) + default: + return false, p, fmt.Errorf("thermo model %q is not implemented", mode) + } + if err != nil || !ok { + return ok, out, err + } + return v.applyProbeThermo(out) +} + +func (v Score) visitLegacyHeuristic(p engine.Product) (bool, engine.Product, error) { // Default conservative fixed denominator denomF, denomR := 200.0, 200.0 diff --git a/internal/thermovisitors/score_test.go b/internal/thermovisitors/score_test.go index e820c4c..4745177 100644 --- a/internal/thermovisitors/score_test.go +++ b/internal/thermovisitors/score_test.go @@ -3,7 +3,10 @@ package thermovisitors import ( "ipcr-core/engine" + "ipcr-core/thermo" + "ipcr/internal/thermomodel" "math" + "strings" "testing" ) @@ -49,28 +52,26 @@ func TestAlignPenalty_PositionEffects(t *testing.T) { } pIn := dpPenalty(pr, string(ti), false) - // Assert relative ordering: 3' > 5' > internal (position multiplier × chemistry) - if !(p3 > p5 && p5 > pIn) { - t.Fatalf("position penalties not ordered as expected: 3' %.2f, 5' %.2f, internal %.2f", p3, p5, pIn) + // Terminal-window weighting should still make the same terminal mismatch more + // severe at the 3' end than at the 5' end. Internal mismatch magnitude is now + // context dependent because exact SantaLucia-Hicks triplet overrides are used. + if !(p3 > p5 && pIn > 0) { + t.Fatalf("unexpected position penalties: 3' %.2f, 5' %.2f, internal %.2f", p3, p5, pIn) } } -func TestAlignPenalty_Chemistry_GTvsGA(t *testing.T) { - // Primer of G's so we can toggle the target at an internal position. +func TestAlignPenalty_UsesExactGTTripletOverride(t *testing.T) { + // Primer of G's gives an internal G/T triplet key of: + // 5'-GGG-3' / 3'-CTC-5'. The curated SantaLucia-Hicks compiled-gauge + // ΔΔG°37 for that context is 3.44 kcal/mol, so D=200 gives 17.20 °C. pr := "GGGGGGGGGG" - tgtPerfect := "CCCCCCCCCC" // 3'→5' + tgt := []byte("CCCCCCCCCC") // 3'→5' + tgt[4] = 'T' - // Internal index = 4 → compare chemistries for that column - tGT := []byte(tgtPerfect) - tGT[4] = 'T' // G•T wobble (milder) - pGT := dpPenalty(pr, string(tGT), false) - - tGA := []byte(tgtPerfect) - tGA[4] = 'A' // G•A (harsher than GT in our table) - pGA := dpPenalty(pr, string(tGA), false) - - if !(pGT < pGA) { - t.Fatalf("chemistry ordering failed: expected GT(%.2f) < GA(%.2f)", pGT, pGA) + pGT := dpPenalty(pr, string(tgt), false) + want := thermo.DeltaGToDeltaTm(3.44, 200.0) + if math.Abs(pGT-want) > 1e-9 { + t.Fatalf("GT triplet penalty: got %.12g want %.12g", pGT, want) } } @@ -93,6 +94,14 @@ func rc5to3(s string) string { return string(b) } +func perfectAmplicon(fwd, rev string, length int) string { + filler := length - len(fwd) - len(rev) + if filler < 0 { + filler = 0 + } + return fwd + strings.Repeat("A", filler) + rc5to3(rev) +} + func TestScore_ImprovesWithPerfectEnds(t *testing.T) { fwd := "ACGTAC" rev := "GGTACC" @@ -123,3 +132,485 @@ func TestScore_ImprovesWithPerfectEnds(t *testing.T) { t.Fatalf("NaN scores") } } + +func TestScore_DefaultModelMatchesExplicitLegacyHeuristic(t *testing.T) { + fwd := "ACGTAC" + rev := "GGTACC" + amp := fwd + "AAAA" + rc5to3(rev) + p := engine.Product{ + FwdPrimer: fwd, RevPrimer: rev, + Seq: amp, Length: len(amp), Type: "forward", + } + + base := Score{AnnealTempC: 60, Na_M: 0.05, PrimerConc_M: 2.5e-7, AllowIndels: true} + _, gotDefault, err := base.Visit(p) + if err != nil { + t.Fatalf("default Visit returned error: %v", err) + } + + base.Model = thermomodel.LegacyHeuristic + _, gotLegacy, err := base.Visit(p) + if err != nil { + t.Fatalf("legacy Visit returned error: %v", err) + } + + if gotDefault.Score != gotLegacy.Score { + t.Fatalf("default model changed score: default=%g legacy=%g", gotDefault.Score, gotLegacy.Score) + } +} + +func TestScore_NNDuplexModelProducesThermoComponents(t *testing.T) { + fwd := "ACGTACGTACGTACGTACGT" + rev := "ACGTACGTACGTACGTACGT" + amp := fwd + "AAAA" + rc5to3(rev) + p := engine.Product{ + FwdPrimer: fwd, RevPrimer: rev, + Seq: amp, Length: len(amp), Type: "forward", + } + + v := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 60, Na_M: 0.05, PrimerConc_M: 2.5e-7} + _, got, err := v.Visit(p) + if err != nil { + t.Fatalf("NNDuplex Visit returned error: %v", err) + } + if got.Thermo == nil { + t.Fatal("expected thermo components") + } + if got.Thermo.Model != thermomodel.NNDuplexV1.String() { + t.Fatalf("got model %q", got.Thermo.Model) + } + if got.Score != got.Thermo.ScoreC { + t.Fatalf("score/component mismatch: %g vs %g", got.Score, got.Thermo.ScoreC) + } + if got.Thermo.Fwd.MismatchPenaltyC != 0 || got.Thermo.Rev.MismatchPenaltyC != 0 { + t.Fatalf("perfect duplex should not have mismatch penalties: %+v", got.Thermo) + } +} + +func TestScore_NNDuplexAnnealTemperatureChangesScore(t *testing.T) { + fwd := "ACGTACGTACGTACGTACGT" + rev := "ACGTACGTACGTACGTACGT" + amp := fwd + "AAAA" + rc5to3(rev) + p := engine.Product{FwdPrimer: fwd, RevPrimer: rev, Seq: amp} + + low := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 55, Na_M: 0.05, PrimerConc_M: 2.5e-7} + high := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 70, Na_M: 0.05, PrimerConc_M: 2.5e-7} + _, pLow, err := low.Visit(p) + if err != nil { + t.Fatalf("low anneal Visit: %v", err) + } + _, pHigh, err := high.Visit(p) + if err != nil { + t.Fatalf("high anneal Visit: %v", err) + } + if !(pLow.Score > pHigh.Score) { + t.Fatalf("expected lower anneal temp to produce higher margin score: low=%g high=%g", pLow.Score, pHigh.Score) + } +} + +func TestScore_NNDuplexMismatchUsesFallbackAndLowersScore(t *testing.T) { + fwd := "ACGTACGTACGTACGTACGT" + rev := "ACGTACGTACGTACGTACGT" + amp := fwd + "AAAA" + rc5to3(rev) + p := engine.Product{FwdPrimer: fwd, RevPrimer: rev, Seq: amp} + v := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 60, Na_M: 0.05, PrimerConc_M: 2.5e-7} + _, perfect, err := v.Visit(p) + if err != nil { + t.Fatalf("perfect Visit: %v", err) + } + + badAmp := []byte(amp) + badAmp[len(fwd)-1] = 'A' + if amp[len(fwd)-1] == 'A' { + badAmp[len(fwd)-1] = 'C' + } + p.Seq = string(badAmp) + _, mismatched, err := v.Visit(p) + if err != nil { + t.Fatalf("mismatch Visit: %v", err) + } + if !(perfect.Score > mismatched.Score) { + t.Fatalf("expected mismatch to lower NN score: perfect=%g mismatched=%g", perfect.Score, mismatched.Score) + } + if mismatched.Thermo == nil || !mismatched.Thermo.Fwd.HasNonWatsonCrick || mismatched.Thermo.Fwd.MismatchCuratedPairCount != 1 || mismatched.Thermo.Fwd.UsedHeuristicAdjust { + t.Fatalf("expected fwd curated pair-family mismatch details, got %+v", mismatched.Thermo) + } + if mismatched.Thermo.Fwd.MismatchCount != 1 || mismatched.Thermo.Fwd.ThreePrimeMismatchCount != 1 { + t.Fatalf("expected one 3' mismatch to be reported, got %+v", mismatched.Thermo.Fwd) + } + if mismatched.Thermo.Fwd.MismatchPolicy != thermo.MismatchPolicyImperfectCuratedPair { + t.Fatalf("unexpected mismatch policy: %+v", mismatched.Thermo.Fwd) + } +} + +func TestScore_NNStructureModelAddsStructureComponents(t *testing.T) { + fwd := "GCGCGCGC" + rev := "GCGCGCGC" + amp := fwd + "AAAA" + rc5to3(rev) + p := engine.Product{FwdPrimer: fwd, RevPrimer: rev, Seq: amp} + + duplex := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 60, Na_M: 0.05, PrimerConc_M: 2.5e-7} + _, base, err := duplex.Visit(p) + if err != nil { + t.Fatalf("NNDuplex Visit: %v", err) + } + + structure := Score{ + Model: thermomodel.NNStructureV1, + AnnealTempC: 60, + Na_M: 0.05, + PrimerConc_M: 2.5e-7, + StructHairpin: true, + StructDimer: true, + StructScale: 1.0, + } + _, got, err := structure.Visit(p) + if err != nil { + t.Fatalf("NNStructure Visit: %v", err) + } + if got.Thermo == nil || got.Thermo.Model != thermomodel.NNStructureV1.String() { + t.Fatalf("expected nn-structure-v1 details, got %+v", got.Thermo) + } + if got.Thermo.CrossDimer == nil { + t.Fatalf("expected cross-dimer component, got %+v", got.Thermo) + } + if got.Thermo.StructurePenaltyC <= 0 { + t.Fatalf("expected positive structure penalty, got %+v", got.Thermo) + } + if !(got.Score < base.Score) { + t.Fatalf("expected structure-aware score to be lower than duplex-only score: structure=%g duplex=%g", got.Score, base.Score) + } +} + +func TestScore_NNDuplexBaseScoreMatchesFinalScore(t *testing.T) { + fwd := "ACGTACGTACGTACGTACGT" + rev := "ACGTACGTACGTACGTACGT" + amp := fwd + "AAAA" + rc5to3(rev) + p := engine.Product{FwdPrimer: fwd, RevPrimer: rev, Seq: amp, Type: "forward"} + + v := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 60, Na_M: 0.05, PrimerConc_M: 2.5e-7} + _, got, err := v.Visit(p) + if err != nil { + t.Fatalf("NNDuplex Visit returned error: %v", err) + } + if got.Thermo == nil { + t.Fatal("expected thermo details") + } + if got.Thermo.BaseScoreC != got.Thermo.ScoreC || got.Score != got.Thermo.BaseScoreC { + t.Fatalf("expected duplex base/final score parity, got score=%g thermo=%+v", got.Score, got.Thermo) + } +} + +func TestScore_NNStructurePanelCrossDimerPenalty(t *testing.T) { + fwd := "AAAACGCGCGCGCGCG" + rev := "TTTTATATATATATAT" + partner := "TTTTCGCGCGCGCGCG" + amp := fwd + "AAAA" + rc5to3(rev) + p := engine.Product{FwdPrimer: fwd, RevPrimer: rev, Seq: amp, Type: "forward"} + + v := Score{ + Model: thermomodel.NNStructureV1, + AnnealTempC: 60, + Na_M: 0.05, + PrimerConc_M: 2.5e-7, + StructHairpin: false, + StructDimer: true, + StructScale: 1, + PanelPrimers: []PrimerRef{ + {ID: "current-fwd", Seq: fwd}, + {ID: "current-rev", Seq: rev}, + {ID: "panel_partner", Seq: partner}, + }, + } + _, got, err := v.Visit(p) + if err != nil { + t.Fatalf("NNStructure Visit returned error: %v", err) + } + if got.Thermo == nil { + t.Fatal("expected thermo details") + } + if got.Thermo.PanelCrossDimer == nil { + t.Fatalf("expected panel cross-dimer details, got %+v", got.Thermo) + } + if got.Thermo.PanelCrossDimerPenaltyC <= 0 || got.Thermo.PanelCrossDimerBurdenC <= 0 || got.Thermo.PanelCrossDimerCount <= 0 { + t.Fatalf("expected positive panel cross-dimer penalty/burden/count, got %+v", got.Thermo) + } + if got.Thermo.PanelCrossDimer.QueryB != "panel_partner" { + t.Fatalf("expected panel partner label, got %+v", got.Thermo.PanelCrossDimer) + } + if !(got.Score < got.Thermo.BaseScoreC) { + t.Fatalf("expected panel dimer penalty to lower score: score=%g base=%g", got.Score, got.Thermo.BaseScoreC) + } +} + +func TestScore_GelProfileAddsAmpliconObservableTerms(t *testing.T) { + // Xiong-style Salmonella multiplex primers. The binding-only NN score ranks + // the short, high-Tm product highest; gel-observable ranking should be able + // to include amplicon mass and extension penalties as explicit components. + o1 := "ATGTCTATAAGCACCACAATG" + o2 := "TCATTTCAATAATGATTCAAGC" + o3 := "CATTCTGACCTTTAAGCCGGTCAATGAG" + o4 := "CCAAAAAGCGAGACCTCAAACTTACTCAG" + o5 := "GCGGACGTCATTGTCACTAACCCGACG" + o6 := "TCTAAAGTGGGAACCCGATGTTCAGCG" + + p155 := engine.Product{ExperimentID: "O5+O6", FwdPrimer: o5, RevPrimer: o6, Seq: perfectAmplicon(o5, o6, 155), Length: 155} + p339 := engine.Product{ExperimentID: "O3+O4", FwdPrimer: o3, RevPrimer: o4, Seq: perfectAmplicon(o3, o4, 339), Length: 339} + p882 := engine.Product{ExperimentID: "O1+O2", FwdPrimer: o1, RevPrimer: o2, Seq: perfectAmplicon(o1, o2, 882), Length: 882} + + binding := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 60, Na_M: 0.05, PrimerConc_M: 2.5e-7} + _, b155, err := binding.Visit(p155) + if err != nil { + t.Fatalf("binding 155 Visit: %v", err) + } + _, b339, err := binding.Visit(p339) + if err != nil { + t.Fatalf("binding 339 Visit: %v", err) + } + if !(b155.Score > b339.Score) { + t.Fatalf("expected binding-only score to prefer short high-Tm product: 155=%g 339=%g", b155.Score, b339.Score) + } + + gel := Score{ + Model: thermomodel.NNStructureV1, + AnnealTempC: 60, + Na_M: 0.05, + PrimerConc_M: 2.5e-7, + StructHairpin: true, + StructDimer: true, + StructScale: 1, + ScoreProfile: scoreProfileGel, + ExtAlpha: 0.45, + ExtWeight: 1, + LenKneeBP: 550, + LenSteep: 0.003, + LenMaxPenC: 10, + BandMassWeight: 15, + } + _, g155, err := gel.Visit(p155) + if err != nil { + t.Fatalf("gel 155 Visit: %v", err) + } + _, g339, err := gel.Visit(p339) + if err != nil { + t.Fatalf("gel 339 Visit: %v", err) + } + _, g882, err := gel.Visit(p882) + if err != nil { + t.Fatalf("gel 882 Visit: %v", err) + } + + if g339.Thermo == nil || g339.Thermo.ScoreProfile != scoreProfileGel { + t.Fatalf("expected gel thermo details, got %+v", g339.Thermo) + } + if g339.Thermo.BandMassBonusC <= g155.Thermo.BandMassBonusC { + t.Fatalf("expected longer visible product to get larger band-mass term: 339=%g 155=%g", g339.Thermo.BandMassBonusC, g155.Thermo.BandMassBonusC) + } + if g882.Thermo.LengthPenaltyC <= 0 { + t.Fatalf("expected long product extension/length penalty, got %+v", g882.Thermo) + } + if !(g339.Score > g882.Score && g882.Score > g155.Score) { + t.Fatalf("expected gel profile rank 339 > 882 > 155; got 339=%g 882=%g 155=%g", g339.Score, g882.Score, g155.Score) + } +} + +func TestScore_NNDuplexReportsEndEffectComponents(t *testing.T) { + fwd := "ACGTACGTACGTACGTACGT" + rev := "ACGTACGTACGTACGTACGT" + amp := fwd + "GAAA" + rc5to3(rev) + p := engine.Product{FwdPrimer: fwd, RevPrimer: rev, Seq: amp, Length: len(amp), Type: "forward"} + v := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 60, Na_M: 0.05, PrimerConc_M: 2.5e-7} + + _, got, err := v.Visit(p) + if err != nil { + t.Fatalf("NNDuplex Visit: %v", err) + } + if got.Thermo == nil { + t.Fatal("expected thermo details") + } + if got.Thermo.Fwd.DanglingEndCount != 1 || got.Thermo.Fwd.DanglingEndAdjustmentC <= 0 { + t.Fatalf("expected fwd dangling-end diagnostics, got %+v", got.Thermo.Fwd) + } + if got.Thermo.Fwd.EndEffectPolicy != thermo.EndEffectPolicyTemplateDanglingV1 { + t.Fatalf("unexpected fwd end-effect policy: %+v", got.Thermo.Fwd) + } + + badAmp := []byte(amp) + badAmp[len(fwd)-1] = 'A' + if amp[len(fwd)-1] == 'A' { + badAmp[len(fwd)-1] = 'C' + } + p.Seq = string(badAmp) + _, mismatched, err := v.Visit(p) + if err != nil { + t.Fatalf("mismatched NNDuplex Visit: %v", err) + } + if mismatched.Thermo.Fwd.ThreePrimeTerminalMismatchCount != 1 || mismatched.Thermo.Fwd.TerminalMismatchPenaltyC <= 0 { + t.Fatalf("expected explicit 3' terminal mismatch diagnostics, got %+v", mismatched.Thermo.Fwd) + } + if mismatched.Thermo.Fwd.EndEffectPolicy != thermo.EndEffectPolicyTerminalMismatchV1 { + t.Fatalf("unexpected terminal end-effect policy: %+v", mismatched.Thermo.Fwd) + } +} + +func TestScore_IUPACThermoPolicyWorstBestMeanEnumerate(t *testing.T) { + fwdDegenerate := "ACGRACGTACGT" + fwdBestVariant := "ACGAACGTACGT" + rev := "ACGTACGTACGT" + amp := fwdBestVariant + strings.Repeat("A", 12) + rc5to3(rev) + p := engine.Product{FwdPrimer: fwdDegenerate, RevPrimer: rev, Seq: amp, Length: len(amp), Type: "forward"} + base := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 60, Na_M: 0.05, PrimerConc_M: 2.5e-7} + + worstVisitor := base + worstVisitor.IUPACThermoPolicy = thermo.IUPACThermoPolicyWorst + _, worst, err := worstVisitor.Visit(p) + if err != nil { + t.Fatalf("worst Visit: %v", err) + } + bestVisitor := base + bestVisitor.IUPACThermoPolicy = thermo.IUPACThermoPolicyBest + _, best, err := bestVisitor.Visit(p) + if err != nil { + t.Fatalf("best Visit: %v", err) + } + meanVisitor := base + meanVisitor.IUPACThermoPolicy = thermo.IUPACThermoPolicyMean + _, mean, err := meanVisitor.Visit(p) + if err != nil { + t.Fatalf("mean Visit: %v", err) + } + enumVisitor := base + enumVisitor.IUPACThermoPolicy = thermo.IUPACThermoPolicyEnumerate + _, enumerated, err := enumVisitor.Visit(p) + if err != nil { + t.Fatalf("enumerate Visit: %v", err) + } + + if best.Score <= worst.Score { + t.Fatalf("expected best score > worst score; best=%g worst=%g", best.Score, worst.Score) + } + if !(mean.Score > worst.Score && mean.Score < best.Score) { + t.Fatalf("expected mean between worst and best; mean=%g worst=%g best=%g", mean.Score, worst.Score, best.Score) + } + if worst.Thermo == nil || best.Thermo == nil || mean.Thermo == nil || enumerated.Thermo == nil { + t.Fatal("expected thermo details for all IUPAC policies") + } + if worst.Thermo.IUPACThermoPolicy != thermo.IUPACThermoPolicyWorst || worst.Thermo.IUPACExpansionCount != 2 { + t.Fatalf("unexpected worst metadata: %+v", worst.Thermo) + } + if worst.Thermo.IUPACEffectiveVariant == best.Thermo.IUPACEffectiveVariant { + t.Fatalf("expected worst and best to select different variants; both=%q", worst.Thermo.IUPACEffectiveVariant) + } + if !strings.Contains(worst.Thermo.IUPACEffectiveVariant, "fwd=") || !strings.Contains(best.Thermo.IUPACEffectiveVariant, "fwd=") { + t.Fatalf("expected concrete effective variants; worst=%q best=%q", worst.Thermo.IUPACEffectiveVariant, best.Thermo.IUPACEffectiveVariant) + } + if enumerated.Thermo.IUPACThermoPolicy != thermo.IUPACThermoPolicyEnumerate || len(enumerated.Thermo.IUPACVariants) != 2 { + t.Fatalf("expected enumerated variants, got %+v", enumerated.Thermo) + } +} + +func TestScore_IUPACThermoExpansionCap(t *testing.T) { + fwd := "NNNN" + rev := "ACGT" + amp := "AAAA" + strings.Repeat("A", 12) + rc5to3(rev) + p := engine.Product{FwdPrimer: fwd, RevPrimer: rev, Seq: amp, Length: len(amp), Type: "forward"} + v := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 60, Na_M: 0.05, PrimerConc_M: 2.5e-7, IUPACThermoPolicy: thermo.IUPACThermoPolicyWorst, IUPACThermoMaxExpansions: 3} + _, got, err := v.Visit(p) + if err != nil { + t.Fatalf("Visit: %v", err) + } + if got.Thermo == nil || got.Thermo.IUPACExpansionCount != 3 || !got.Thermo.IUPACExpansionCapped { + t.Fatalf("expected capped IUPAC metadata, got %+v", got.Thermo) + } +} + +func TestScore_ProbeThermoGateKeepsFoundProbeAndAnnotates(t *testing.T) { + fwd := "ACGTACGTACGTACGTACGT" + rev := "TGCATGCATGCATGCATGCA" + probe := "GATTACAGATTACAGATTAC" + amp := fwd + "AAAA" + probe + "AAAA" + rc5to3(rev) + p := engine.Product{FwdPrimer: fwd, RevPrimer: rev, Seq: amp, Length: len(amp), Type: "forward"} + + v := Score{ + Model: thermomodel.NNDuplexV1, + AnnealTempC: 60, + Na_M: 0.05, + PrimerConc_M: 2.5e-7, + ProbeSeq: probe, + ProbeName: "p1", + ProbeThermo: true, + ProbeScoreMode: probeScoreModeGate, + ProbeMinMarginC: -100, + } + keep, got, err := v.Visit(p) + if err != nil { + t.Fatalf("Visit returned error: %v", err) + } + if !keep { + t.Fatal("expected probe-gated product to be kept") + } + if got.Thermo == nil || got.Thermo.Probe == nil { + t.Fatalf("expected probe thermo details, got %+v", got.Thermo) + } + pr := got.Thermo.Probe + if !pr.Found || pr.Name != "p1" || pr.Seq != probe || pr.ScoreMode != probeScoreModeGate { + t.Fatalf("unexpected probe annotation: %+v", pr) + } + if pr.TmC == 0 || pr.IUPACExpansionCount != 1 || pr.IUPACEffectiveVariant != probe { + t.Fatalf("expected populated probe thermo fields, got %+v", pr) + } +} + +func TestScore_ProbeThermoGateDropsMissingProbe(t *testing.T) { + fwd := "ACGTACGTACGTACGTACGT" + rev := "TGCATGCATGCATGCATGCA" + amp := fwd + strings.Repeat("A", 40) + rc5to3(rev) + p := engine.Product{FwdPrimer: fwd, RevPrimer: rev, Seq: amp, Length: len(amp), Type: "forward"} + + v := Score{ + Model: thermomodel.NNDuplexV1, + AnnealTempC: 60, + Na_M: 0.05, + PrimerConc_M: 2.5e-7, + ProbeSeq: "GATTACAGATTACAGATTAC", + ProbeThermo: true, + ProbeScoreMode: probeScoreModeGate, + } + keep, _, err := v.Visit(p) + if err != nil { + t.Fatalf("Visit returned error: %v", err) + } + if keep { + t.Fatal("expected gate mode to drop products missing the probe") + } +} + +func TestScore_ProbeThermoBlendPenalizesMissingProbe(t *testing.T) { + fwd := "ACGTACGTACGTACGTACGT" + rev := "TGCATGCATGCATGCATGCA" + amp := fwd + strings.Repeat("A", 40) + rc5to3(rev) + p := engine.Product{FwdPrimer: fwd, RevPrimer: rev, Seq: amp, Length: len(amp), Type: "forward"} + + base := Score{Model: thermomodel.NNDuplexV1, AnnealTempC: 60, Na_M: 0.05, PrimerConc_M: 2.5e-7} + _, withoutProbe, err := base.Visit(p) + if err != nil { + t.Fatalf("base Visit returned error: %v", err) + } + withProbe := base + withProbe.ProbeSeq = "GATTACAGATTACAGATTAC" + withProbe.ProbeThermo = true + withProbe.ProbeScoreMode = probeScoreModeBlend + keep, got, err := withProbe.Visit(p) + if err != nil { + t.Fatalf("probe Visit returned error: %v", err) + } + if !keep { + t.Fatal("blend mode should not drop missing-probe products") + } + if !(got.Score < withoutProbe.Score) { + t.Fatalf("expected missing probe to penalize score: base=%g got=%g", withoutProbe.Score, got.Score) + } + if got.Thermo == nil || got.Thermo.Probe == nil || got.Thermo.Probe.ScoreContributionC >= 0 { + t.Fatalf("expected negative probe score contribution, got %+v", got.Thermo) + } +} diff --git a/internal/writers/product.go b/internal/writers/product.go index 66cd581..b61bcd1 100644 --- a/internal/writers/product.go +++ b/internal/writers/product.go @@ -9,13 +9,14 @@ import ( ) type productArgs struct { - Sort bool - Header bool - Pretty bool - Opt pretty.Options - Scores bool // NEW: include 'score' in TSV - RankByScore bool // NEW: prefer score sort over coord - In <-chan engine.Product + Sort bool + Header bool + Pretty bool + Opt pretty.Options + Scores bool // NEW: include 'score' in TSV + RankByScore bool // NEW: prefer score sort over coord + ThermoDetails bool // append compact NN thermodynamic diagnostics in text/TSV + In <-chan engine.Product } func drainProducts(ch <-chan engine.Product) []engine.Product { @@ -81,19 +82,25 @@ func init() { if args.Scores { h += "\tscore" // <- assignOp fix (was: h = h + "\tscore") } + if args.ThermoDetails { + h += "\t" + output.ThermoDetailsTSVHeader + } _, err := io.WriteString(w, h+"\n") return err } writeRow := func(p engine.Product) error { - if args.Scores { - if _, err := io.WriteString(w, output.FormatRowTSVWithScore(p)+"\n"); err != nil { - return err - } - } else { - if _, err := io.WriteString(w, output.FormatBaseRowTSV(p)+"\n"); err != nil { - return err - } + row := output.FormatBaseRowTSV(p) + switch { + case args.Scores && args.ThermoDetails: + row = output.FormatRowTSVWithScoreAndThermoDetails(p) + case args.Scores: + row = output.FormatRowTSVWithScore(p) + case args.ThermoDetails: + row = output.FormatRowTSVWithThermoDetails(p) + } + if _, err := io.WriteString(w, row+"\n"); err != nil { + return err } if args.Pretty { if _, err := io.WriteString(w, pretty.RenderProductWithOptions(p, args.Opt)); err != nil { @@ -137,10 +144,18 @@ func init() { // Public API (updated to carry scores & rank-by-score) func StartProductWriter(out io.Writer, format string, sort, header, prettyMode, includeScore, rankByScore bool, bufSize int) (chan<- engine.Product, <-chan error) { - return StartProductWriterWithPrettyOptions(out, format, sort, header, prettyMode, includeScore, rankByScore, pretty.DefaultOptions, bufSize) + return StartProductWriterWithThermoDetails(out, format, sort, header, prettyMode, includeScore, rankByScore, false, bufSize) +} + +func StartProductWriterWithThermoDetails(out io.Writer, format string, sort, header, prettyMode, includeScore, rankByScore, thermoDetails bool, bufSize int) (chan<- engine.Product, <-chan error) { + return StartProductWriterWithPrettyOptionsAndThermoDetails(out, format, sort, header, prettyMode, includeScore, rankByScore, thermoDetails, pretty.DefaultOptions, bufSize) } func StartProductWriterWithPrettyOptions(out io.Writer, format string, sort, header, prettyMode, includeScore, rankByScore bool, popt pretty.Options, bufSize int) (chan<- engine.Product, <-chan error) { + return StartProductWriterWithPrettyOptionsAndThermoDetails(out, format, sort, header, prettyMode, includeScore, rankByScore, false, popt, bufSize) +} + +func StartProductWriterWithPrettyOptionsAndThermoDetails(out io.Writer, format string, sort, header, prettyMode, includeScore, rankByScore, thermoDetails bool, popt pretty.Options, bufSize int) (chan<- engine.Product, <-chan error) { if bufSize <= 0 { bufSize = 64 } @@ -148,13 +163,14 @@ func StartProductWriterWithPrettyOptions(out io.Writer, format string, sort, hea errCh := make(chan error, 1) go func() { err := WriteProduct(format, out, productArgs{ - Sort: sort, - Header: header, - Pretty: prettyMode, - Opt: popt, - Scores: includeScore, - RankByScore: rankByScore, - In: in, + Sort: sort, + Header: header, + Pretty: prettyMode, + Opt: popt, + Scores: includeScore, + RankByScore: rankByScore, + ThermoDetails: thermoDetails, + In: in, }) errCh <- err }() diff --git a/internal/writers/product_score_tsv_test.go b/internal/writers/product_score_tsv_test.go index 119b600..ae9cbde 100644 --- a/internal/writers/product_score_tsv_test.go +++ b/internal/writers/product_score_tsv_test.go @@ -31,3 +31,62 @@ func TestProductWriter_TSVScoreHeaderAndSort(t *testing.T) { t.Fatalf("expected first row to be score=3.2, got: %q", lines[1]) } } + +func TestProductWriter_TSVThermoDetailsHeaderAndRow(t *testing.T) { + var buf bytes.Buffer + in, done := StartProductWriterWithThermoDetails(&buf, "text", false, true, false, true, true, true, 4) + + in <- engine.Product{ + SourceFile: "ref.fa", + SequenceID: "s", + ExperimentID: "x", + Start: 0, + End: 10, + Length: 10, + Type: "forward", + Score: 1.5, + Thermo: &engine.ThermoDetails{ + Model: "nn-structure-v1", + SaltModel: "monovalent", + NaM: 0.05, + MgM: 0.003, + DntpM: 0.0008, + EffectiveNaM: 0.05, + FreeMgM: 0.0022, + AnnealTempC: 60, + ScoreProfile: "binding", + BaseScoreC: 3.5, + ScoreC: 1.5, + StructurePenaltyC: 2.0, + LimitingSide: "fwd", + PanelCrossDimerPenaltyC: 1.25, + PanelCrossDimerBurdenC: 2.75, + PanelCrossDimerCount: 2, + PanelCrossDimer: &engine.ThermoStructure{ + Kind: "cross-dimer", + QueryA: "fwd", + QueryB: "external", + PenaltyC: 1.25, + }, + }, + } + close(in) + if err := <-done; err != nil { + t.Fatalf("writer err: %v", err) + } + + out := buf.String() + lines := strings.Split(strings.TrimSpace(out), "\n") + if len(lines) != 2 { + t.Fatalf("unexpected TSV lines (%d): %q", len(lines), out) + } + if !strings.Contains(lines[0], "score\tthermo_model\tsalt_model\tna_m\tmg_m\tdntp_m\teffective_na_m\tfree_mg_m") || !strings.Contains(lines[0], "panel_cross_dimer_penalty_c") { + t.Fatalf("expected thermo details header, got: %q", lines[0]) + } + if !strings.Contains(lines[1], "\tnn-structure-v1\tmonovalent\t0.05\t0.003\t0.0008\t0.05\t0.0022\t60\t\t\t\t\tbinding\t3.5\t1.5") || !strings.Contains(lines[1], "\t2\tfwd") { + t.Fatalf("expected thermo detail values, got: %q", lines[1]) + } + if !strings.Contains(lines[1], "\t1.25\t2.75\t2\tfwd~external") { + t.Fatalf("expected panel cross-dimer details, got: %q", lines[1]) + } +} diff --git a/pkg/api/products_v1.go b/pkg/api/products_v1.go index 135086b..ce8fc51 100644 --- a/pkg/api/products_v1.go +++ b/pkg/api/products_v1.go @@ -19,6 +19,184 @@ type ProductV1 struct { // NEW: optional score, used by ipcr-thermo; omitted otherwise Score float64 `json:"score,omitempty"` + + // Optional thermodynamic score components, emitted by NN thermo models. + Thermo *ThermoDetailsV1 `json:"thermo,omitempty"` +} + +// ThermoDetailsV1 is an optional extension object for ipcr-thermo NN modes. +type ThermoDetailsV1 struct { + Model string `json:"model"` + SaltModel string `json:"salt_model"` + NaM float64 `json:"na_m,omitempty"` + MgM float64 `json:"mg_m,omitempty"` + DntpM float64 `json:"dntp_m,omitempty"` + EffectiveNaM float64 `json:"effective_na_m,omitempty"` + FreeMgM float64 `json:"free_mg_m,omitempty"` + AnnealTempC float64 `json:"anneal_temp_c"` + IUPACPolicy string `json:"iupac_policy"` + IUPACThermoPolicy string `json:"iupac_thermo_policy,omitempty"` + IUPACExpansionCount int `json:"iupac_expansion_count,omitempty"` + IUPACExpansionCapped bool `json:"iupac_expansion_capped,omitempty"` + IUPACEffectiveVariant string `json:"iupac_effective_variant,omitempty"` + IUPACVariants []ThermoIUPACVariantV1 `json:"iupac_variants,omitempty"` + MismatchPolicy string `json:"mismatch_policy"` + StructurePolicy string `json:"structure_policy,omitempty"` + ScoreProfile string `json:"score_profile,omitempty"` + ScoreC float64 `json:"score_c"` + BaseScoreC float64 `json:"base_score_c,omitempty"` + AmpliconAdjustmentC float64 `json:"amplicon_adjustment_c,omitempty"` + ExtensionLogit float64 `json:"extension_logit,omitempty"` + ExtensionBonusC float64 `json:"extension_bonus_c,omitempty"` + LengthPenaltyC float64 `json:"length_penalty_c,omitempty"` + BandMassBonusC float64 `json:"band_mass_bonus_c,omitempty"` + StructurePenaltyC float64 `json:"structure_penalty_c,omitempty"` + LimitingSide string `json:"limiting_side"` + Fwd ThermoEndpointV1 `json:"fwd"` + Rev ThermoEndpointV1 `json:"rev"` + Probe *ProbeThermoV1 `json:"probe,omitempty"` + WorstHairpin *ThermoStructureV1 `json:"worst_hairpin,omitempty"` + WorstSelfDimer *ThermoStructureV1 `json:"worst_self_dimer,omitempty"` + CrossDimer *ThermoStructureV1 `json:"cross_dimer,omitempty"` + PanelCrossDimer *ThermoStructureV1 `json:"panel_cross_dimer,omitempty"` + PanelCrossDimerPenaltyC float64 `json:"panel_cross_dimer_penalty_c,omitempty"` + PanelCrossDimerBurdenC float64 `json:"panel_cross_dimer_burden_c,omitempty"` + PanelCrossDimerCount int `json:"panel_cross_dimer_count,omitempty"` +} + +// ProbeThermoV1 records internal-probe annotation plus NN probe-target +// thermodynamics for ipcr-thermo outputs. +type ProbeThermoV1 struct { + Name string `json:"name"` + Seq string `json:"seq"` + Found bool `json:"found"` + Strand string `json:"strand,omitempty"` + Pos int `json:"pos,omitempty"` + MM int `json:"mm,omitempty"` + Site string `json:"site,omitempty"` + ScoreMode string `json:"score_mode"` + MinMarginC float64 `json:"min_margin_c,omitempty"` + ScoreContributionC float64 `json:"score_contribution_c,omitempty"` + GatePenaltyC float64 `json:"gate_penalty_c,omitempty"` + IUPACThermoPolicy string `json:"iupac_thermo_policy,omitempty"` + IUPACExpansionCount int `json:"iupac_expansion_count,omitempty"` + IUPACExpansionCapped bool `json:"iupac_expansion_capped,omitempty"` + IUPACEffectiveVariant string `json:"iupac_effective_variant,omitempty"` + TmC float64 `json:"tm_c,omitempty"` + AnnealMarginC float64 `json:"anneal_margin_c,omitempty"` + DeltaGAtAnnealKcal float64 `json:"delta_g_at_anneal_kcal,omitempty"` + MismatchPenaltyC float64 `json:"mismatch_penalty_c,omitempty"` + MismatchDeltaGKcal float64 `json:"mismatch_delta_g_kcal,omitempty"` + MismatchCount int `json:"mismatch_count,omitempty"` + MismatchFallbackCount int `json:"mismatch_fallback_count,omitempty"` + MismatchTripletCount int `json:"mismatch_triplet_count,omitempty"` + MismatchCuratedPairCount int `json:"mismatch_curated_pair_count,omitempty"` + MismatchSources []string `json:"mismatch_sources,omitempty"` + MismatchParameterSets []string `json:"mismatch_parameter_sets,omitempty"` + MismatchCitations []string `json:"mismatch_citations,omitempty"` + MismatchParameterNotes []string `json:"mismatch_parameter_notes,omitempty"` + TerminalMismatchPenaltyC float64 `json:"terminal_mismatch_penalty_c,omitempty"` + TerminalMismatchDeltaGKcal float64 `json:"terminal_mismatch_delta_g_kcal,omitempty"` + TerminalMismatchCount int `json:"terminal_mismatch_count,omitempty"` + FivePrimeTerminalMismatchCount int `json:"five_prime_terminal_mismatch_count,omitempty"` + ThreePrimeTerminalMismatchCount int `json:"three_prime_terminal_mismatch_count,omitempty"` + TerminalMismatchSources []string `json:"terminal_mismatch_sources,omitempty"` + TerminalMismatchParameterSets []string `json:"terminal_mismatch_parameter_sets,omitempty"` + TerminalMismatchCitations []string `json:"terminal_mismatch_citations,omitempty"` + TerminalMismatchParameterNotes []string `json:"terminal_mismatch_parameter_notes,omitempty"` + MismatchPolicy string `json:"mismatch_policy,omitempty"` + HasNonWatsonCrick bool `json:"has_non_watson_crick,omitempty"` + UsedHeuristicAdjust bool `json:"used_heuristic_adjust,omitempty"` +} + +// ThermoIUPACVariantV1 records one scored expansion of a degenerate primer pair. +type ThermoIUPACVariantV1 struct { + FwdVariant string `json:"fwd_variant"` + RevVariant string `json:"rev_variant"` + ScoreC float64 `json:"score_c"` + BaseScoreC float64 `json:"base_score_c,omitempty"` + StructurePenaltyC float64 `json:"structure_penalty_c,omitempty"` + LimitingSide string `json:"limiting_side,omitempty"` + FwdTmC float64 `json:"fwd_tm_c,omitempty"` + RevTmC float64 `json:"rev_tm_c,omitempty"` + FwdMarginC float64 `json:"fwd_margin_c,omitempty"` + RevMarginC float64 `json:"rev_margin_c,omitempty"` +} + +// ThermoStructureV1 describes a secondary-structure competitor. +type ThermoStructureV1 struct { + Kind string `json:"kind"` + Model string `json:"model,omitempty"` + QueryA string `json:"query_a,omitempty"` + QueryB string `json:"query_b,omitempty"` + DeltaGAtAnnealKcal float64 `json:"delta_g_at_anneal_kcal"` + TmC float64 `json:"tm_c"` + AnnealMarginC float64 `json:"anneal_margin_c"` + StemLen int `json:"stem_len"` + LoopLen int `json:"loop_len,omitempty"` + AStart int `json:"a_start"` + AEnd int `json:"a_end"` + BStart int `json:"b_start"` + BEnd int `json:"b_end"` + ThreePrimeAnchored bool `json:"three_prime_anchored"` + BothThreePrimeAnchor bool `json:"both_three_prime_anchor,omitempty"` + SegmentCount int `json:"segment_count,omitempty"` + BulgeCount int `json:"bulge_count,omitempty"` + InternalLoopCount int `json:"internal_loop_count,omitempty"` + DanglingEndCount int `json:"dangling_end_count,omitempty"` + LoopPenaltyKcal float64 `json:"loop_penalty_kcal,omitempty"` + BulgePenaltyKcal float64 `json:"bulge_penalty_kcal,omitempty"` + InternalLoopPenaltyKcal float64 `json:"internal_loop_penalty_kcal,omitempty"` + StructureDanglingDeltaGKcal float64 `json:"structure_dangling_delta_g_kcal,omitempty"` + EnsembleDeltaGAtAnnealKcal float64 `json:"ensemble_delta_g_at_anneal_kcal,omitempty"` + PartitionFunction float64 `json:"partition_function,omitempty"` + EnsembleWeight float64 `json:"ensemble_weight,omitempty"` + EnsembleCandidateCount int `json:"ensemble_candidate_count,omitempty"` + DPCellCount int `json:"dp_cell_count,omitempty"` + DPStateCount int `json:"dp_state_count,omitempty"` + DPExpectedPairs float64 `json:"dp_expected_pairs,omitempty"` + DPMFEDeltaGAtAnnealKcal float64 `json:"dp_mfe_delta_g_at_anneal_kcal,omitempty"` + DPEnsembleDeltaGAtAnnealKcal float64 `json:"dp_ensemble_delta_g_at_anneal_kcal,omitempty"` + PenaltyC float64 `json:"penalty_c,omitempty"` +} + +// ThermoEndpointV1 describes a single primer-template endpoint. +type ThermoEndpointV1 struct { + Side string `json:"side"` + TmC float64 `json:"tm_c"` + AnnealMarginC float64 `json:"anneal_margin_c"` + DeltaGAtAnnealKcal float64 `json:"delta_g_at_anneal_kcal"` + MismatchPenaltyC float64 `json:"mismatch_penalty_c"` + MismatchDeltaGKcal float64 `json:"mismatch_delta_g_kcal,omitempty"` + TerminalMismatchPenaltyC float64 `json:"terminal_mismatch_penalty_c,omitempty"` + TerminalMismatchDeltaGKcal float64 `json:"terminal_mismatch_delta_g_kcal,omitempty"` + DanglingEndAdjustmentC float64 `json:"dangling_end_adjustment_c,omitempty"` + DanglingEndDeltaGKcal float64 `json:"dangling_end_delta_g_kcal,omitempty"` + DanglingEndCount int `json:"dangling_end_count,omitempty"` + MismatchCount int `json:"mismatch_count,omitempty"` + FivePrimeMismatchCount int `json:"five_prime_mismatch_count,omitempty"` + ThreePrimeMismatchCount int `json:"three_prime_mismatch_count,omitempty"` + FivePrimeTerminalMismatchCount int `json:"five_prime_terminal_mismatch_count,omitempty"` + ThreePrimeTerminalMismatchCount int `json:"three_prime_terminal_mismatch_count,omitempty"` + TerminalMismatchCount int `json:"terminal_mismatch_count,omitempty"` + FivePrimeTerminalMismatchPenaltyC float64 `json:"five_prime_terminal_mismatch_penalty_c,omitempty"` + ThreePrimeTerminalMismatchPenaltyC float64 `json:"three_prime_terminal_mismatch_penalty_c,omitempty"` + MismatchFallbackCount int `json:"mismatch_fallback_count,omitempty"` + MismatchTripletCount int `json:"mismatch_triplet_count,omitempty"` + MismatchCuratedPairCount int `json:"mismatch_curated_pair_count,omitempty"` + MismatchSources []string `json:"mismatch_sources,omitempty"` + MismatchParameterSets []string `json:"mismatch_parameter_sets,omitempty"` + MismatchCitations []string `json:"mismatch_citations,omitempty"` + MismatchParameterNotes []string `json:"mismatch_parameter_notes,omitempty"` + TerminalMismatchSources []string `json:"terminal_mismatch_sources,omitempty"` + TerminalMismatchParameterSets []string `json:"terminal_mismatch_parameter_sets,omitempty"` + TerminalMismatchCitations []string `json:"terminal_mismatch_citations,omitempty"` + TerminalMismatchParameterNotes []string `json:"terminal_mismatch_parameter_notes,omitempty"` + EffectiveDenomCalK float64 `json:"effective_denom_cal_per_k_mol"` + MismatchPolicy string `json:"mismatch_policy"` + EndEffectPolicy string `json:"end_effect_policy,omitempty"` + HasNonWatsonCrick bool `json:"has_non_watson_crick"` + UsedHeuristicAdjust bool `json:"used_heuristic_adjust"` } // AnnotatedProductV1 is the stable schema for probe-annotated outputs.