Aggregating seemingly different latent spaces.
Setup the development environment:
git clone git@github.com:crisostomi/latent-aggregation.git
cd latent-aggregation
conda env create -f env.yaml
conda activate la
pre-commit installRun the tests:
pre-commit run --all-filesWe use HuggingFace Datasets throughout the project; assuming you already have a HF account (create one if you don't), you will have to login via
huggingface-cli login
which will prompt you to either create a new token or paste an existing one.
Re-install the project in edit mode:
pip install -e '.[dev]'Each experiment exp_name in part_shared_part_novel, same_classes_disj_samples, totally_disjoint has three scripts:
prepare_data_${exp_name}.pydivides the data in tasks according to what the experiment expects;run_${exp_name}.pytrains the task-specific models and uses them to embed the data for each task;analyze_${exp_name}.pyobtains the results for the experiment.
Each script has a corresponding conf file in conf/ with the same name.
So, to run the part_shared_part_novel, you have to first configure the experiment in conf/prepare_data_part_shared_part_novel.yaml. In this case, you have to choose a value for num_shared_classes and num_novel_classes_per_task. Now you will prepare the data via
python src/la/scripts/prepare_data_part_shared_part_novel.py
this will populate the data/${dataset_name}/part_shared_part_novel/ folder. Then you'll embed the data by running
python src/la/scripts/run_part_shared_part_novel.py
so that now you will have the encoded data in data/${dataset_name}/part_shared_part_novel/S${num_shared_classes}_N${num_novel_classes_per_task}.
Having all the latent spaces, you can now run the actual experiment and collect the results by running
python src/la/scripts/analyze_part_shared_part_novel.py
The results can now be found in results/part_shared_part_novel.