Topological Loss Engineering (TLE) · `toploss`

A research paper and a working PyTorch package that take the recent (2024–2026) wave of optimizer breakthroughs and rebuild them as plain penalties you add to the loss.

Paper — Topological_Loss_Engineering.pdf (14 pages: full math, proofs, figures, references)
Package — toploss/ — install it and use the five regularizers as drop-in modules
DOI — 10.5281/zenodo.20480620

pip install toploss

_{The TLE recipe: five 2024–2026 optimizer breakthroughs (top) become five differentiable
loss penalties (middle), composed into one objective that a vanilla SGD/Adam minimizes — with no
custom optimizer.}

What we invented, in plain words

Modern training tricks usually live inside the optimizer, the piece of code that decides how to update the model's weights each step. Methods like SAM, Cautious Weight Decay, and AdEMAMix all change the optimizer so the model lands in a "flatter," better-behaved spot, which tends to generalize better. The downside is that each one is a custom optimizer with extra memory and special update rules, and they are hard to mix and match.

Our core idea is a swap. Instead of changing the optimizer, we change the loss, the single number the model tries to make small. We show that the useful behavior of those optimizers can be rewritten as an extra term added to the loss. Once you do that, an ordinary optimizer like Adam reproduces the same effect, with no special machinery. We call the general recipe Topological Loss Engineering, and we prove when it works with an Optimizer–Regularizer Correspondence Principle.

From that principle we built five new penalties. Here is each one in one sentence.

SASP makes the model prefer flat, stable solutions. We found an exact formula for it that costs almost nothing, while the usual method (SAM) doubles your training time.
CVP is a smarter weight decay. Normal weight decay shrinks every weight blindly and can ruin a good solution. CVP only shrinks a weight when doing so is safe.
NGER balances training across several tasks at once. It ignores noise it cannot fix and spends effort where the model can still improve.
SBMP makes the model robust to small changes in its input by gently penalizing inconsistency, instead of hacking noise into the network by hand.
DMTA keeps the training trajectory pointed in a consistent long-term direction, copying the memory benefit of AdEMAMix without its extra optimizer state.

What is genuinely new here:

The correspondence principle itself: a clean statement (with proof) of when an optimizer trick can be moved into the loss, and a stop-gradient trick for the cases that are not straightforward.
The exact, no-extra-cost formula for SASP. Sharpness-aware training normally needs a second pass through the network. We show that for a standard classifier head it reduces to a short formula in quantities you already computed.
Five concrete, tested penalties plus a single composite loss, all packaged so anyone can use them with one import.

The five regularizers

Name	Inspired by	What it does
SASP	SAM / XSAM	flattens curvature using a closed-form empirical-Fisher trace, no second backward pass
CVP	Cautious Weight Decay	weight decay that only fires when it is safe (sigmoid-gated, stop-grad)
NGER	NTKMTL + excess risk	multi-task weights that ignore irreducible noise
SBMP	NEFTune / SymNoise	input robustness via a consistency penalty
DMTA	AdEMAMix	aligns the current gradient with a slow long-term direction

The shared mathematical backbone: for an optimizer update u = -η (g + c(θ)), the discrete trajectory is — to first order — gradient flow on L(θ) + Φ(θ) with ∇Φ = c. Each toploss penalty supplies exactly that c, so the loss now carries the topological bias that previously lived in the optimizer.

Install

pip install toploss                 # from PyPI

From a clone of this repository:

pip install ./toploss               # builds and installs in one step

Or build the wheel yourself, then install it:

cd toploss && python -m build
pip install dist/toploss-0.1.0-py3-none-any.whl

The only runtime dependency is torch>=1.12.

Quick start

import torch.nn.functional as F
from toploss import TopologicalLoss

crit = TopologicalLoss(lambda_sasp=1e-2, lambda_cvp=1e-2)

logits, feats = model.forward_with_features(x)   # head logits + head inputs
base = F.cross_entropy(logits, y)
loss = crit(base_loss=base, logits=logits, features=feats,
            targets=y, params=model.parameters())
loss.backward()
optimizer.step()        # any optimizer

Each regularizer is also available on its own: SASPLoss, CVPRegularizer, NGERWeighter, SBMPLoss, DMTATracker. The helper apply_cvp_ injects cautious decay straight into .grad after backward() for zero overhead:

loss.backward()
apply_cvp_(model.parameters(), lam=1e-2, beta=50.0)
optimizer.step()

A runnable end-to-end script lives in toploss/examples/quickstart.py.

Results (real, three-seed experiments)

All numbers below are produced by experiments/run_experiments.py and stored in experiments/results.json.

Result	Number
SASP: Hessian trace at convergence	down 41% (170.8 → 100.0)
SASP: test cross-entropy	down 18% (0.817 → 0.673)
SASP wall-clock overhead	+15% (vs SAM's +107%)
CVP	robust across the decay coefficient where uniform L2 overshrinks
NGER: clean-signal recovery on a noisy co-task	1.8× better (0.251 → 0.139 MSE)
Composite (SASP + CVP): test accuracy	70.2% → 75.7%

Everything is CPU-only, seeded, and uses no downloaded data.

_{SASP — lower test cross-entropy and a 41% smaller Hessian trace at convergence.}	_{CVP — preserves parameter volume and stays robust across decay strengths where uniform L2 overshrinks.}
_{NGER — recovers a noisy co-task's clean signal instead of chasing irreducible noise.}	_{SBMP — higher test accuracy under growing input perturbation.}

_{Composite — SASP and CVP stack cleanly: 70.2% → 75.7% test accuracy.}

Repository layout

toploss/
├── Topological_Loss_Engineering.pdf   # the paper (14 pages, rendered)
├── Topological_Loss_Engineering.tex   # paper LaTeX source (same as paper/main.tex)
├── README.md                          # this file
├── CITATION.cff                       # GitHub "Cite this repository" metadata
├── assets/                            # README figures (PNG, rendered from paper/figs)
├── .github/workflows/                 # CI tests + PyPI publish on release
│
├── toploss/                           # 📦 the pip package
│   ├── src/toploss/                   #    source: functional.py, modules.py, py.typed
│   ├── tests/                         #    11 tests verifying the math identities
│   ├── examples/                      #    runnable quickstart
│   ├── pyproject.toml                 #    package metadata
│   └── LICENSE                        #    MIT
│
├── paper/                             # 📄 compilable LaTeX bundle (main.tex + figs/)
└── experiments/                       # 🧪 scripts that produce every figure + results.json

Note: build artifacts (toploss/dist/, *.egg-info, LaTeX *.aux/*.log, etc.) are intentionally excluded by .gitignore. The wheel and sdist are produced on demand by python -m build, or automatically by the release workflow.

Reproduce everything

# 1. run the tests (verify the math identities)
cd toploss && PYTHONPATH=src pytest tests/ -q

# 2. regenerate all figures and results.json
cd ../experiments && python run_experiments.py && python make_schematics.py

# 3. rebuild the paper (run twice to resolve cross-references)
cd ../paper && pdflatex main.tex && pdflatex main.tex

Steps 1–2 need torch, numpy, and matplotlib; step 3 needs a LaTeX distribution.

Cite

@misc{patil2026tle,
  author    = {Patil, Rishabh A.},
  title     = {Topological Loss Engineering: Embedding Optimizer-Side
               Geometric Constraints Directly Into the Objective Function},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20480620},
  url       = {https://doi.org/10.5281/zenodo.20480620}
}

License

Code is released under the MIT License (see toploss/LICENSE). The paper is shared under CC BY 4.0 via Zenodo.

Author: Rishabh A. Patil · ORCID 0009-0007-0868-9673.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topological Loss Engineering (TLE) · `toploss`

What we invented, in plain words

The five regularizers

Install

Quick start

Results (real, three-seed experiments)

Repository layout

Reproduce everything

Cite

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
assets		assets
experiments		experiments
paper		paper
toploss		toploss
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
Topological_Loss_Engineering.pdf		Topological_Loss_Engineering.pdf
Topological_Loss_Engineering.tex		Topological_Loss_Engineering.tex

Folders and files

Latest commit

History

Repository files navigation

Topological Loss Engineering (TLE) · toploss

What we invented, in plain words

The five regularizers

Install

Quick start

Results (real, three-seed experiments)

Repository layout

Reproduce everything

Cite

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Topological Loss Engineering (TLE) · `toploss`

Packages