Problem
.git/ currently weighs ~259 MB on disk (88 MB packed, 166 MB loose + pack),
dominated by blobs that are no longer in HEAD. Recent additions are clean
(no sqlite added to HEAD in the last 90 days — only one PNG in that window),
so the bloat is entirely legacy.
Biggest blobs observed in history by git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)':
| Blob (history only, not in HEAD) |
Size |
Occurrences |
Notes |
flextool.lp |
310 MB |
1× |
One-off solver LP dump, never intended to be committed. Pure accident. |
Input_data.sqlite |
22 MB |
20+ |
Old test/dev data, superseded by fixtures in HEAD. |
.spinetoolbox/items/flextool3_test_data/FlexTool3_data.sqlite |
22 MB |
2× |
Same story — old test data. |
notebooks/RETO |
5.7 MB |
2× |
Binary never intended for git. |
notebooks/.ipynb_checkpoints/RETO |
5.7 MB |
1× |
Checkpoint cache of the above. |
Older highs.exe / libstdc++-6.dll versions |
18 MB + 28 MB each |
a few |
Solver binaries that were rolled over when new versions landed. |
Goals
- Remove from history the blobs that were never part of any build contract
(pure accidents, clear to cut).
- Decide whether to also drop legacy solver binaries from history. This
is a soft break for old-commit checkouts: running flextool at those
commits would require providing a solver externally.
- Do not break anything currently in HEAD.
Scope
Must remove from history
flextool.lp
notebooks/RETO
notebooks/.ipynb_checkpoints/RETO
These were never load-bearing. Zero break risk.
Probably also remove
- All 20+ historical
Input_data.sqlite blobs (22 MB each).
- Both
.spinetoolbox/items/flextool3_test_data/FlexTool3_data.sqlite blobs.
These are old test/dev snapshots. Removing them from history cannot affect
current HEAD or any running tests — the fixtures those tests need live at
new paths under HEAD. The only impact is that someone doing git archaeology
on very old commits won't find them.
Deliberately leave alone (for this pass)
- Older
bin/highs.exe, libstdc++-6.dll, libopenblas*, etc. versions
that are no longer in HEAD but were bundled with earlier flextool
releases. The tradeoff:
- Pro: removing saves ~90 MB from history.
- Con:
git checkout <old-hash> won't have the solver binary in
the tree anymore. Running flextool at that commit requires pip-
installing highspy or dropping in a current binary manually.
Any tag/release built strictly from old commits would become
incomplete. git bisect through those old commits may fail to run.
- Verdict: solver binaries are external tools; skipping bisect on
them is usually fine. But bundle it into a separate, announced
repo-maintenance pass rather than folding into this cleanup, so the
scope is obvious to collaborators.
Also in scope for a later revisit
how to example databases/*.sqlite (8 files, ~6.2 MB total in HEAD).
templates/examples.sqlite, templates/time_settings_only.sqlite
(~1.5 MB in HEAD).
- User has green-lit keeping these for now, flagged them for future
re-consideration once alternatives are in place (e.g. generating them
on demand from rivendell-to-flextool-style helper packages, matching
the pattern now used for the continental benchmark).
Prerequisites
-
git-filter-repo is not currently installed:
pip install git-filter-repo
# or: apt install git-filter-repo
-
Remote is shared (origin: git@github.com:irena-flextool/flextool.git),
so the rewrite requires a coordinated force-push window.
-
Current local branches that will be rewritten (all refs):
bind-intraperiod-blocks, constraint-capacity-coeffs,
db-api-use-fixing, dc-power-flow, delay-fix,
new-outputs (current), etc. Each rewritten branch gets new SHAs.
-
Any open PRs on GitHub will be invalidated — must be closed/reopened
or rebased against new HEAD after the push.
Plan
-
Back up .git/ before touching anything:
cp -r .git .git.backup-$(date -u +%Y%m%d)
-
Install the tool and take an analysis snapshot:
pip install git-filter-repo
git filter-repo --analyze # writes .git/filter-repo/analysis/
Confirm the blob list matches expectations.
-
Rewrite history (dry-run first, then execute). Keep the scope tight:
# Dry-run to see what would change:
git filter-repo --invert-paths \
--path flextool.lp \
--path notebooks/RETO \
--path notebooks/.ipynb_checkpoints/RETO \
--path Input_data.sqlite \
--path .spinetoolbox/items/flextool3_test_data/FlexTool3_data.sqlite \
--analyze-commits # pseudo-flag: inspect output before re-running without it
(Note: git filter-repo is not reversible without the .git.backup.
Don't run without --analyze verification first.)
-
Verify HEAD is unchanged — diff git ls-files and checksums of top
tracked files against a fresh clone of the old origin.
-
Verify size reduction:
git gc --aggressive --prune=now
git count-objects -vH
Expect pack size to drop noticeably (order of ~50-100 MB based on
current blob catalogue).
-
Announce the force-push window; then, for each branch:
git push --force-with-lease origin <branch>
All collaborators must git fetch && git reset --hard origin/<branch>
or re-clone. Any open PRs need to be rebased on the new HEAD.
Done when
.git/ size-pack is noticeably smaller (target: under 50 MB).
git log --all --oneline | wc -l is unchanged (no commits lost, only
blob contents removed).
git diff <old-HEAD> <new-HEAD> is empty for every branch (HEAD
trees unchanged).
tests/ pass on the rewritten HEAD.
- All open PRs rebased; all collaborators on the new refs.
Owner
@jkiviluo (coordinated force-push window, collaborator notifications).
Related
- Standalone generator repo
Rivendell_to_FlexTool is now the canonical
location for the Rivendell generator; flextool/rivendell/ was
removed from the working tree in the same session that produced this
spec.
benchmarks/scaling/scenarios/continental/generate.py now rebuilds
its input.sqlite from rivendell-to-flextool on demand into
~/.cache/rivendell_to_flextool/ — example of the pattern for
keeping dev sqlites out of the flextool tree going forward.
Problem
.git/currently weighs ~259 MB on disk (88 MB packed, 166 MB loose + pack),dominated by blobs that are no longer in HEAD. Recent additions are clean
(no sqlite added to HEAD in the last 90 days — only one PNG in that window),
so the bloat is entirely legacy.
Biggest blobs observed in history by
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)':flextool.lpInput_data.sqlite.spinetoolbox/items/flextool3_test_data/FlexTool3_data.sqlitenotebooks/RETOnotebooks/.ipynb_checkpoints/RETOhighs.exe/libstdc++-6.dllversionsGoals
(pure accidents, clear to cut).
is a soft break for old-commit checkouts: running flextool at those
commits would require providing a solver externally.
Scope
Must remove from history
flextool.lpnotebooks/RETOnotebooks/.ipynb_checkpoints/RETOThese were never load-bearing. Zero break risk.
Probably also remove
Input_data.sqliteblobs (22 MB each)..spinetoolbox/items/flextool3_test_data/FlexTool3_data.sqliteblobs.These are old test/dev snapshots. Removing them from history cannot affect
current HEAD or any running tests — the fixtures those tests need live at
new paths under HEAD. The only impact is that someone doing git archaeology
on very old commits won't find them.
Deliberately leave alone (for this pass)
bin/highs.exe,libstdc++-6.dll,libopenblas*, etc. versionsthat are no longer in HEAD but were bundled with earlier flextool
releases. The tradeoff:
git checkout <old-hash>won't have the solver binary inthe tree anymore. Running flextool at that commit requires pip-
installing
highspyor dropping in a current binary manually.Any tag/release built strictly from old commits would become
incomplete.
git bisectthrough those old commits may fail to run.them is usually fine. But bundle it into a separate, announced
repo-maintenance pass rather than folding into this cleanup, so the
scope is obvious to collaborators.
Also in scope for a later revisit
how to example databases/*.sqlite(8 files, ~6.2 MB total in HEAD).templates/examples.sqlite,templates/time_settings_only.sqlite(~1.5 MB in HEAD).
re-consideration once alternatives are in place (e.g. generating them
on demand from
rivendell-to-flextool-style helper packages, matchingthe pattern now used for the continental benchmark).
Prerequisites
git-filter-repois not currently installed:pip install git-filter-repo # or: apt install git-filter-repoRemote is shared (
origin: git@github.com:irena-flextool/flextool.git),so the rewrite requires a coordinated force-push window.
Current local branches that will be rewritten (all refs):
bind-intraperiod-blocks,constraint-capacity-coeffs,db-api-use-fixing,dc-power-flow,delay-fix,new-outputs(current), etc. Each rewritten branch gets new SHAs.Any open PRs on GitHub will be invalidated — must be closed/reopened
or rebased against new HEAD after the push.
Plan
Back up
.git/before touching anything:cp -r .git .git.backup-$(date -u +%Y%m%d)Install the tool and take an analysis snapshot:
pip install git-filter-repo git filter-repo --analyze # writes .git/filter-repo/analysis/Confirm the blob list matches expectations.
Rewrite history (dry-run first, then execute). Keep the scope tight:
(Note:
git filter-repois not reversible without the.git.backup.Don't run without
--analyzeverification first.)Verify HEAD is unchanged — diff
git ls-filesand checksums of toptracked files against a fresh clone of the old
origin.Verify size reduction:
Expect pack size to drop noticeably (order of ~50-100 MB based on
current blob catalogue).
Announce the force-push window; then, for each branch:
All collaborators must
git fetch && git reset --hard origin/<branch>or re-clone. Any open PRs need to be rebased on the new HEAD.
Done when
.git/size-pack is noticeably smaller (target: under 50 MB).git log --all --oneline | wc -lis unchanged (no commits lost, onlyblob contents removed).
git diff <old-HEAD> <new-HEAD>is empty for every branch (HEADtrees unchanged).
tests/pass on the rewritten HEAD.Owner
@jkiviluo (coordinated force-push window, collaborator notifications).
Related
Rivendell_to_FlexToolis now the canonicallocation for the Rivendell generator;
flextool/rivendell/wasremoved from the working tree in the same session that produced this
spec.
benchmarks/scaling/scenarios/continental/generate.pynow rebuildsits
input.sqlitefromrivendell-to-flextoolon demand into~/.cache/rivendell_to_flextool/— example of the pattern forkeeping dev sqlites out of the flextool tree going forward.