Skip to content

MrRobotop/toploss

Repository files navigation

Topological Loss Engineering (TLE)  ·  toploss

Tests PyPI DOI License: MIT Python PyTorch

A research paper and a working PyTorch package that take the recent (2024–2026) wave of optimizer breakthroughs and rebuild them as plain penalties you add to the loss.

pip install toploss

From optimizer-side breakthroughs to loss-side regularizers

The TLE recipe: five 2024–2026 optimizer breakthroughs (top) become five differentiable loss penalties (middle), composed into one objective that a vanilla SGD/Adam minimizes — with no custom optimizer.


What we invented, in plain words

Modern training tricks usually live inside the optimizer, the piece of code that decides how to update the model's weights each step. Methods like SAM, Cautious Weight Decay, and AdEMAMix all change the optimizer so the model lands in a "flatter," better-behaved spot, which tends to generalize better. The downside is that each one is a custom optimizer with extra memory and special update rules, and they are hard to mix and match.

Our core idea is a swap. Instead of changing the optimizer, we change the loss, the single number the model tries to make small. We show that the useful behavior of those optimizers can be rewritten as an extra term added to the loss. Once you do that, an ordinary optimizer like Adam reproduces the same effect, with no special machinery. We call the general recipe Topological Loss Engineering, and we prove when it works with an Optimizer–Regularizer Correspondence Principle.

From that principle we built five new penalties. Here is each one in one sentence.

  • SASP makes the model prefer flat, stable solutions. We found an exact formula for it that costs almost nothing, while the usual method (SAM) doubles your training time.
  • CVP is a smarter weight decay. Normal weight decay shrinks every weight blindly and can ruin a good solution. CVP only shrinks a weight when doing so is safe.
  • NGER balances training across several tasks at once. It ignores noise it cannot fix and spends effort where the model can still improve.
  • SBMP makes the model robust to small changes in its input by gently penalizing inconsistency, instead of hacking noise into the network by hand.
  • DMTA keeps the training trajectory pointed in a consistent long-term direction, copying the memory benefit of AdEMAMix without its extra optimizer state.

What is genuinely new here:

  1. The correspondence principle itself: a clean statement (with proof) of when an optimizer trick can be moved into the loss, and a stop-gradient trick for the cases that are not straightforward.
  2. The exact, no-extra-cost formula for SASP. Sharpness-aware training normally needs a second pass through the network. We show that for a standard classifier head it reduces to a short formula in quantities you already computed.
  3. Five concrete, tested penalties plus a single composite loss, all packaged so anyone can use them with one import.

The five regularizers

Name Inspired by What it does
SASP SAM / XSAM flattens curvature using a closed-form empirical-Fisher trace, no second backward pass
CVP Cautious Weight Decay weight decay that only fires when it is safe (sigmoid-gated, stop-grad)
NGER NTKMTL + excess risk multi-task weights that ignore irreducible noise
SBMP NEFTune / SymNoise input robustness via a consistency penalty
DMTA AdEMAMix aligns the current gradient with a slow long-term direction

The shared mathematical backbone: for an optimizer update u = -η (g + c(θ)), the discrete trajectory is — to first order — gradient flow on L(θ) + Φ(θ) with ∇Φ = c. Each toploss penalty supplies exactly that c, so the loss now carries the topological bias that previously lived in the optimizer.


Install

pip install toploss                 # from PyPI

From a clone of this repository:

pip install ./toploss               # builds and installs in one step

Or build the wheel yourself, then install it:

cd toploss && python -m build
pip install dist/toploss-0.1.0-py3-none-any.whl

The only runtime dependency is torch>=1.12.

Quick start

import torch.nn.functional as F
from toploss import TopologicalLoss

crit = TopologicalLoss(lambda_sasp=1e-2, lambda_cvp=1e-2)

logits, feats = model.forward_with_features(x)   # head logits + head inputs
base = F.cross_entropy(logits, y)
loss = crit(base_loss=base, logits=logits, features=feats,
            targets=y, params=model.parameters())
loss.backward()
optimizer.step()        # any optimizer

Each regularizer is also available on its own: SASPLoss, CVPRegularizer, NGERWeighter, SBMPLoss, DMTATracker. The helper apply_cvp_ injects cautious decay straight into .grad after backward() for zero overhead:

loss.backward()
apply_cvp_(model.parameters(), lam=1e-2, beta=50.0)
optimizer.step()

A runnable end-to-end script lives in toploss/examples/quickstart.py.

Results (real, three-seed experiments)

All numbers below are produced by experiments/run_experiments.py and stored in experiments/results.json.

Result Number
SASP: Hessian trace at convergence down 41% (170.8 → 100.0)
SASP: test cross-entropy down 18% (0.817 → 0.673)
SASP wall-clock overhead +15% (vs SAM's +107%)
CVP robust across the decay coefficient where uniform L2 overshrinks
NGER: clean-signal recovery on a noisy co-task 1.8× better (0.251 → 0.139 MSE)
Composite (SASP + CVP): test accuracy 70.2% → 75.7%

Everything is CPU-only, seeded, and uses no downloaded data.

SASP: generalization and curvature
SASP — lower test cross-entropy and a 41% smaller Hessian trace at convergence.
CVP: volume trajectory and robustness
CVP — preserves parameter volume and stays robust across decay strengths where uniform L2 overshrinks.
NGER: clean-signal recovery
NGER — recovers a noisy co-task's clean signal instead of chasing irreducible noise.
SBMP: robustness to input noise
SBMP — higher test accuracy under growing input perturbation.

Composite ablation
Composite — SASP and CVP stack cleanly: 70.2% → 75.7% test accuracy.

Repository layout

toploss/
├── Topological_Loss_Engineering.pdf   # the paper (14 pages, rendered)
├── Topological_Loss_Engineering.tex   # paper LaTeX source (same as paper/main.tex)
├── README.md                          # this file
├── CITATION.cff                       # GitHub "Cite this repository" metadata
├── assets/                            # README figures (PNG, rendered from paper/figs)
├── .github/workflows/                 # CI tests + PyPI publish on release
│
├── toploss/                           # 📦 the pip package
│   ├── src/toploss/                   #    source: functional.py, modules.py, py.typed
│   ├── tests/                         #    11 tests verifying the math identities
│   ├── examples/                      #    runnable quickstart
│   ├── pyproject.toml                 #    package metadata
│   └── LICENSE                        #    MIT
│
├── paper/                             # 📄 compilable LaTeX bundle (main.tex + figs/)
└── experiments/                       # 🧪 scripts that produce every figure + results.json

Note: build artifacts (toploss/dist/, *.egg-info, LaTeX *.aux/*.log, etc.) are intentionally excluded by .gitignore. The wheel and sdist are produced on demand by python -m build, or automatically by the release workflow.

Reproduce everything

# 1. run the tests (verify the math identities)
cd toploss && PYTHONPATH=src pytest tests/ -q

# 2. regenerate all figures and results.json
cd ../experiments && python run_experiments.py && python make_schematics.py

# 3. rebuild the paper (run twice to resolve cross-references)
cd ../paper && pdflatex main.tex && pdflatex main.tex

Steps 1–2 need torch, numpy, and matplotlib; step 3 needs a LaTeX distribution.

Cite

@misc{patil2026tle,
  author    = {Patil, Rishabh A.},
  title     = {Topological Loss Engineering: Embedding Optimizer-Side
               Geometric Constraints Directly Into the Objective Function},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20480620},
  url       = {https://doi.org/10.5281/zenodo.20480620}
}

License

Code is released under the MIT License (see toploss/LICENSE). The paper is shared under CC BY 4.0 via Zenodo.

Author: Rishabh A. Patil · ORCID 0009-0007-0868-9673.

About

Topological Loss Engineering: optimizer-free PyTorch regularizers (SASP, CVP, NGER, SBMP, DMTA) that embed 2024-2026 optimizer breakthroughs directly into the loss.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors