A research paper and a working PyTorch package that take the recent (2024–2026) wave of optimizer breakthroughs and rebuild them as plain penalties you add to the loss.
- Paper —
Topological_Loss_Engineering.pdf(14 pages: full math, proofs, figures, references) - Package —
toploss/— install it and use the five regularizers as drop-in modules - DOI — 10.5281/zenodo.20480620
pip install toplossThe TLE recipe: five 2024–2026 optimizer breakthroughs (top) become five differentiable loss penalties (middle), composed into one objective that a vanilla SGD/Adam minimizes — with no custom optimizer.
Modern training tricks usually live inside the optimizer, the piece of code that decides how to update the model's weights each step. Methods like SAM, Cautious Weight Decay, and AdEMAMix all change the optimizer so the model lands in a "flatter," better-behaved spot, which tends to generalize better. The downside is that each one is a custom optimizer with extra memory and special update rules, and they are hard to mix and match.
Our core idea is a swap. Instead of changing the optimizer, we change the loss, the single number the model tries to make small. We show that the useful behavior of those optimizers can be rewritten as an extra term added to the loss. Once you do that, an ordinary optimizer like Adam reproduces the same effect, with no special machinery. We call the general recipe Topological Loss Engineering, and we prove when it works with an Optimizer–Regularizer Correspondence Principle.
From that principle we built five new penalties. Here is each one in one sentence.
- SASP makes the model prefer flat, stable solutions. We found an exact formula for it that costs almost nothing, while the usual method (SAM) doubles your training time.
- CVP is a smarter weight decay. Normal weight decay shrinks every weight blindly and can ruin a good solution. CVP only shrinks a weight when doing so is safe.
- NGER balances training across several tasks at once. It ignores noise it cannot fix and spends effort where the model can still improve.
- SBMP makes the model robust to small changes in its input by gently penalizing inconsistency, instead of hacking noise into the network by hand.
- DMTA keeps the training trajectory pointed in a consistent long-term direction, copying the memory benefit of AdEMAMix without its extra optimizer state.
What is genuinely new here:
- The correspondence principle itself: a clean statement (with proof) of when an optimizer trick can be moved into the loss, and a stop-gradient trick for the cases that are not straightforward.
- The exact, no-extra-cost formula for SASP. Sharpness-aware training normally needs a second pass through the network. We show that for a standard classifier head it reduces to a short formula in quantities you already computed.
- Five concrete, tested penalties plus a single composite loss, all packaged so anyone can use them with one import.
| Name | Inspired by | What it does |
|---|---|---|
| SASP | SAM / XSAM | flattens curvature using a closed-form empirical-Fisher trace, no second backward pass |
| CVP | Cautious Weight Decay | weight decay that only fires when it is safe (sigmoid-gated, stop-grad) |
| NGER | NTKMTL + excess risk | multi-task weights that ignore irreducible noise |
| SBMP | NEFTune / SymNoise | input robustness via a consistency penalty |
| DMTA | AdEMAMix | aligns the current gradient with a slow long-term direction |
The shared mathematical backbone: for an optimizer update u = -η (g + c(θ)), the
discrete trajectory is — to first order — gradient flow on L(θ) + Φ(θ) with ∇Φ = c.
Each toploss penalty supplies exactly that c, so the loss now carries the
topological bias that previously lived in the optimizer.
pip install toploss # from PyPIFrom a clone of this repository:
pip install ./toploss # builds and installs in one stepOr build the wheel yourself, then install it:
cd toploss && python -m build
pip install dist/toploss-0.1.0-py3-none-any.whlThe only runtime dependency is torch>=1.12.
import torch.nn.functional as F
from toploss import TopologicalLoss
crit = TopologicalLoss(lambda_sasp=1e-2, lambda_cvp=1e-2)
logits, feats = model.forward_with_features(x) # head logits + head inputs
base = F.cross_entropy(logits, y)
loss = crit(base_loss=base, logits=logits, features=feats,
targets=y, params=model.parameters())
loss.backward()
optimizer.step() # any optimizerEach regularizer is also available on its own: SASPLoss, CVPRegularizer,
NGERWeighter, SBMPLoss, DMTATracker. The helper apply_cvp_ injects cautious
decay straight into .grad after backward() for zero overhead:
loss.backward()
apply_cvp_(model.parameters(), lam=1e-2, beta=50.0)
optimizer.step()A runnable end-to-end script lives in toploss/examples/quickstart.py.
All numbers below are produced by experiments/run_experiments.py
and stored in experiments/results.json.
| Result | Number |
|---|---|
| SASP: Hessian trace at convergence | down 41% (170.8 → 100.0) |
| SASP: test cross-entropy | down 18% (0.817 → 0.673) |
| SASP wall-clock overhead | +15% (vs SAM's +107%) |
| CVP | robust across the decay coefficient where uniform L2 overshrinks |
| NGER: clean-signal recovery on a noisy co-task | 1.8× better (0.251 → 0.139 MSE) |
| Composite (SASP + CVP): test accuracy | 70.2% → 75.7% |
Everything is CPU-only, seeded, and uses no downloaded data.

Composite — SASP and CVP stack cleanly: 70.2% → 75.7% test accuracy.
toploss/
├── Topological_Loss_Engineering.pdf # the paper (14 pages, rendered)
├── Topological_Loss_Engineering.tex # paper LaTeX source (same as paper/main.tex)
├── README.md # this file
├── CITATION.cff # GitHub "Cite this repository" metadata
├── assets/ # README figures (PNG, rendered from paper/figs)
├── .github/workflows/ # CI tests + PyPI publish on release
│
├── toploss/ # 📦 the pip package
│ ├── src/toploss/ # source: functional.py, modules.py, py.typed
│ ├── tests/ # 11 tests verifying the math identities
│ ├── examples/ # runnable quickstart
│ ├── pyproject.toml # package metadata
│ └── LICENSE # MIT
│
├── paper/ # 📄 compilable LaTeX bundle (main.tex + figs/)
└── experiments/ # 🧪 scripts that produce every figure + results.json
Note: build artifacts (
toploss/dist/,*.egg-info, LaTeX*.aux/*.log, etc.) are intentionally excluded by.gitignore. The wheel and sdist are produced on demand bypython -m build, or automatically by the release workflow.
# 1. run the tests (verify the math identities)
cd toploss && PYTHONPATH=src pytest tests/ -q
# 2. regenerate all figures and results.json
cd ../experiments && python run_experiments.py && python make_schematics.py
# 3. rebuild the paper (run twice to resolve cross-references)
cd ../paper && pdflatex main.tex && pdflatex main.texSteps 1–2 need torch, numpy, and matplotlib; step 3 needs a LaTeX distribution.
@misc{patil2026tle,
author = {Patil, Rishabh A.},
title = {Topological Loss Engineering: Embedding Optimizer-Side
Geometric Constraints Directly Into the Objective Function},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.20480620},
url = {https://doi.org/10.5281/zenodo.20480620}
}Code is released under the MIT License (see toploss/LICENSE). The
paper is shared under CC BY 4.0 via Zenodo.
Author: Rishabh A. Patil · ORCID 0009-0007-0868-9673.




