- Name: Jay Chiehen Liao (廖傑恩)
- ID: R13922210
- E-mail: r13922210@ntu.edu.tw
This repo reproduces key findings from Masked Autoencoders Are Scalable Vision Learners (MAE) on CIFAR-10: self-supervised pretraining improves downstream classification versus training from scratch, and we studied how decoder depth and decoder width affect MAE pretraining and downstream results.
- Code is adapted from a simplified MAE implementation using ViT blocks from
timm. - Dataset is CIFAR-10 (auto-downloaded via
torchvision).
-
Install required packages listed in
requirements.txt. -
You may create a conda env:
conda create --name mae python=3.12 conda activate mae pip install -r requirements.txt
-
model.pydefines MAE encoder/decoder and aViT_Classifierbuilt on the pretrained encoder. Encoder masks patches, the decoder reconstructs. The classifier reuses encoder weights and adds a linear head. -
mae_pretrain.pyself-supervises pretraining on CIFAR-10 with cosine-decayed LR, warmup, random masking. The script also logs loss to a csv file and periodically saves image grids underimages/epoch_XXXX/. -
train_classifier.pytrains a classifier on CIFAR-10, either from scratch or loading a pretrained encoder. It supports linear probe (with--linear_probe) or full fine-tuning. -
visualize_pred.pyproduces side-by-side prediction figures for pretrained vs. scratch classifiers on the same set of test images. -
utils.py: seeds and a tiny CSV logger. -
metrics.ipynbgets accuracy scores and losses frommetrics.csvand visualize them.
Outputs are written under:
outputs/<EXP_NAME>/
mae-pretrain/ # pretraining (checkpoint, images/, tensorboard/)
pretrain-cls/ # classifier from pretrained encoder
pretrain-cls-lin/ # linear probe from pretrained encoder
scratch-cls/ # classifier trained from scratch
-
Version of 600 epochs
You may modify the shell script to adjust the number of epochs.
bash commands/mae_pretrain.sh
This writes the checkpoint to:
outputs/main_exp/mae-pretrain/vit-t-mae.pt -
Version of 150 epochs
bash commands/mae_pretrain_150.sh
This writes the checkpoint to:
outputs/main_exp_150_50/mae-pretrain/vit-t-mae.pt
-
Version of 600-100 epochs
bash commands/scratch_cls.sh
-
Version of 150-50 epochs
bash commands/scratch_cls_50.sh
-
Version of 600-100 epochs
bash commands/pretrain_cls.sh
-
Version of 150-50 epochs
bash commands/pretrain_cls_50.sh
Before running, please ensure the main experiment (main_exp) has completed.
bash visualize_pred.shThis repo explores decoder depth and decoder width during MAE pretraining:
--decoder_layer {2,(4),6,8}, where depth=4 is the default setting.
-
MAE Pretraining
bash commands/mae_pretrain_depth.sh
-
Fine-tuning. Before running the following commands, ensure
commands/mae_pretrain_depth.shand experimentmain_exp_150_50have been completed.bash commands/ft_depth.sh bash commands/ft_default_d4_w192.sh
-
Linear probing. Before running the following commands, ensure
commands/mae_pretrain_depth.shand experimentmain_exp_150_50have been completed.bash commands/lin_depth.sh bash commands/lin_default_d4_w192.sh
--decoder_dim {64,128,(192),256,512}, where decoder_dim=192 is the default setting.
-
MAE Pretraining
bash commands/mae_pretrain_width.sh
-
Fine-tuning. Before running the following commands. Before running the following commands, ensure
commands/mae_pretrain_width.shhas been completed.bash commands/ft_width.sh
-
Linear probing. Before running the following commands. Before running the following commands, ensure
commands/mae_pretrain_width.shhas been completed.bash commands/lin_width.sh
- CSV logs: per-epoch metrics are written as
metrics.csvin each run folder. - Please run all cells in
metrics.ipynbto get accuracy scores and training loss curves. Note that you may need to change the file paths if you modify the shell scripts mentioned above.
Set --seed (default 42). CuDNN is set to deterministic.
-
Re-running with the same
--exp_namefor pretraining will error on existing directories. Change--exp_nameor delete the old folder. -
Path assertion: the classifier asserts the pretrained path contains
mae-pretrain. Use the generated path structure or relax the assertion if you prefer.
-
He, Kaiming, et al. “Masked Autoencoders Are Scalable Vision Learners.” CVPR (2022).
-
Krizhevsky, Alex. “Learning Multiple Layers of Features from Tiny Images.” Tech Report, 2009 (CIFAR-10).