This document explains how GloViTa handles training from precomputed HDF5
feature files and how glovita_extract_features produces those files.
Training from precomputed features uses:
- dataset config:
precomputed_features - encoder config:
precomputed - a standard head such as
classification - or the MIL
clamhead for bag data
The precomputed encoder is effectively an identity backbone. This lets the
rest of the normal model stack keep working:
- model config
- head selection
- PEFT reconstruction
- trainer logic
Each HDF5 file must contain:
featureslabels
The current loader supports three shapes.
features: (N, D)
labels: (N,)
features: (B, N, D)
labels: (B,)
features: (M, D)
labels: (B,)
bag_ptr: (B + 1,)
or
features: (M, D)
labels: (B,)
bag_lengths: (B,)
The active loader lives in:
The dataset factory integrates it through:
For bag-style data, the dataloader pads bags within a batch and returns:
features: padded tensor(B, N_max, D)mask: boolean tensor(B, N_max)
This is intended for bag-aware heads such as clam.
glovita_train \
--data.dataset precomputed_features \
--data.data_root_dir . \
--data.num_classes 1000 \
--data.train_features_file /path/to/train_features.h5 \
--data.val_features_file /path/to/val_features.h5 \
--model.encoder.encoder_type precomputed \
--model.encoder.feature_dim 1536 \
--model.head.head_type classification \
--dataloading.batch_size 512glovita_train \
--data.dataset precomputed_features \
--data.data_root_dir . \
--data.num_classes 2 \
--data.train_features_file /path/to/train_bags.h5 \
--data.val_features_file /path/to/val_bags.h5 \
--model.encoder.encoder_type precomputed \
--model.encoder.feature_dim 1024 \
--model.head.head_type clam \
--model.head.variant sb \
--model.head.instance_eval \
--dataloading.batch_size 8If a separate test file exists:
--data.test_features_file /path/to/test_features.h5data.data_root_diris part of the shared data schema but is not used for feature loading in the same way as image datasetsdata.num_classesmust be set explicitlymodel.encoder.feature_dimmust match the stored feature dimension- augmentations are not used for
precomputed_features clamconsumes raw bag features directly, somodel.feature_aggregation_methodis ignored for that head
glovita_extract_features writes the same HDF5 format.
It supports:
- explicit config mode
- checkpoint reconstruction mode
glovita_extract_features \
--method joint \
--output_dir ./precomputed_features \
--data.dataset cifar10 \
--data.data_root_dir ./data \
--model.encoder.encoder_type timm \
--model.encoder.type vit_base_patch16_224 \
--model.head.head_type classification \
--dataloading.batch_size 128glovita_extract_features \
--checkpoint_path ./experiments/cifar10/my_run/0/checkpoints/last.pt \
--output_dir ./precomputed_features \
--output_filename "{checkpoint}_{dataset}_{split}_{method}.h5"In checkpoint mode, the script:
- finds
config.jsonnext to the checkpoint run directory - reconstructs the saved model and PEFT configuration
- loads checkpoint weights
- extracts features from the selected split(s)
Important extraction fields:
methodcls_tokenavgsummean_alljoint
splittrainvaltest- or all when unset
precisioncompressionoutput_diroutput_filenameuse_eval_transform_for_train
Notes:
jointcorresponds to concatenated CLS-token + average patch-token featurespeftdefaults tofull_finetuning- by default, extraction uses evaluation transforms for the train split as well, so train feature extraction is deterministic
If --output_filename is unset, the script uses:
agg_{method}_{model}_{dataset}_{split}_size{imgsize}_float{precision}.h5
Supported placeholders:
methodmodeldatasetsplitimgsizeprecisioncheckpoint
- mil.md: MIL/CLAM path for bag features
- ../README.md: main config and CLI model