Add timm_finetune module: iNaturalist-pretrained backbone fine-tuning (ConvNeXt-L / EVA-02)#51
Open
trgardos wants to merge 1 commit into
Open
Add timm_finetune module: iNaturalist-pretrained backbone fine-tuning (ConvNeXt-L / EVA-02)#51trgardos wants to merge 1 commit into
trgardos wants to merge 1 commit into
Conversation
…ning New self-contained module finetuning/timm_finetune/ for the Tier 1.1 lever — fine-tune iNaturalist-pretrained backbones instead of ImageNet-22k SWIN. The SWIN trainer and its configs are untouched. train.py is forked from SWIN_finetuning_advanced.py with a timm backbone path added (build_backbone with hf-hub support, generalized MultiTaskModel wrapper, timm-derived preprocessing, backbone metadata saved to config.json); the backbone-agnostic machinery (mixup collator, MixupTrainer, EMA, eval loop, metrics, balanced softmax, full-species multi-task heads) is reused unchanged. Backbone is selected via model.backbone_type: hf|timm. Full fine-tune only; timm + single-task/ArcFace intentionally raise a clear error. Configs (multi-task on full 15.5k scientificNameEncoded == Kaggle metric, balanced softmax, EMA, medium aug, TTA wired, 2-GPU eff batch 128): - convnext_large_inat_384_2gpu.yml (default): timm/convnext_large_mlp. laion2b_ft_augreg_inat21 @384. NB the plan's convnextv2_base.inat21_384 has no loadable repo (404); this ConvNeXt-L iNat21 is the closest established one. - eva02_large_inat_336_2gpu.yml: timm/eva02_large_patch14_clip_336. merged2b_ft_inat21 @336. Plus train.sh / submit.sh (mirror the SWIN launchers) and README. Validated: py_compile, import, both configs parse, and an end-to-end main() smoke (real ConvNeXt-L weights -> 272 families / 2564 genera / 15500 species, train + eval + EMA ran; only the checkpoint write failed on /tmp disk space, not a code path). Needs `pip install --user timm`. timm inference/submission (SWIN prediction.py is SWIN-specific) is a documented follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
First run is ongoing and recorded at https://wandb.ai/gardoslab/herbdl/runs/convnext_l_inat_384_seed0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A self-contained module
finetuning/timm_finetune/to fine-tune iNaturalist-pretrained(timm) backbones — the Tier 1.1 "domain backbone" lever, the single highest-leverage untaken
item toward higher Kaggle-2022 macro-F1. No changes to
SWIN_finetuning_advanced.pyor itsconfigs (per request — separate folder, own configs/scripts).
train.pyis forked from the SWIN trainer with a timm path added; the backbone-agnosticmachinery (mixup collator,
MixupTrainer, EMA, custom eval loop, metrics, balanced softmax, andthe full-species multi-task heads) is reused unchanged. Backbone is chosen via
model.backbone_type: hf|timm.Key design
build_backbone()—timm.create_model(name, num_classes=0)(pooled features), with hf-hubsupport so iNat21 checkpoints not in timm's registry load by repo id; HF SWIN path preserved.
MultiTaskModel(renamed fromMultiTaskSwinModel) —_pooled()returns thepooled vector for either backbone; grad-checkpointing routes to timm
set_grad_checkpointing.timm.data.resolve_model_data_configfor timm; backbonemetadata saved into
config.jsonfor rebuild. Full fine-tune only; timm + single-task/ArcFaceraise a clear error.
Backbones (what actually loads)
The plan's
timm/convnextv2_base.inat21_384has no loadable repo (BBracke/...→ 404). Theestablished iNat21 timm checkpoints, verified to load via hf-hub:
convnext_large_inat_384_2gpu.yml(default) —timm/convnext_large_mlp.laion2b_ft_augreg_inat21@384 (ConvNeXt-L; closest to the ConvNeXt/384 intent).
eva02_large_inat_336_2gpu.yml—timm/eva02_large_patch14_clip_336.merged2b_ft_inat21@336.Both: multi-task on the full 15.5k
scientificNameEncodedspecies (== leaderboard metric),balanced softmax, EMA, medium aug, TTA wired, 2-GPU effective batch 128.
Validation
py_compile+ import; both configs parse; end-to-endmain()smoke with real ConvNeXt-Lweights:
backbone_type=timm→ loaded (1536-d) → 272 families / 2564 genera / 15500 species →train + eval + EMA ran; the only failure was the checkpoint write hitting /tmp disk space
(real runs write to
/projectnb), not a code path.Setup / notes
pip install --user timm(timm 1.0.27 used).EMAIL=tgardos@bu.edu NGPUS=2 RUN_PREFIX=CONVNEXT_L_INAT_384 CONFIG=configs/convnext_large_inat_384_2gpu.yml SEEDS="0" bash submit.shprediction.pyis SWIN-specific) — a smalltimm predictor using the saved
config.jsonis a follow-up.multitask-full-species(PR Multi-task: species head targets full Kaggle species (15.5k), not epithet (6.9k) #50); retarget tomainafter that merges.🤖 Generated with Claude Code