GitHub - miriusz6/Scan_Experiments: The project is a fresh take on experiments from Generalization without Systematicity using Transformer architecture instead of RNNs.

This project was my exam submission for Advanced Topics in NLP course at UCPH. Here, I explore how the Transformer architecture compares to older architectures, such as RNNs, in terms of systematic generalization. My work is a reproduction of the experiments presented in the paper Generalization without Systematicity, which originally investigated the degree to which RNNs can generalize to new or rarely seen words in novel contexts.

The primary motivation behind my project is to test the hypothesis that the Transformer architecture will generally outperform RNN networks on these tasks. I also sought to evaluate whether a pretrained model like BART, despite its power, might be hindered by the specific, limited vocabulary of the target dataset compared to a standard "vanilla" Transformer.

Methodology and Architectures

To conduct this investigation, I employed two distinct Transformer-based approaches:

Vanilla Models: These are standard Transformer architectures implemented from scratch without any pretraining on data outside the scope of the experiments.
BART Models: I utilized the standard BART architecture with weights pretrained by Facebook, which I then fine-tuned using Parameter-Efficient Fine-Tuning (PEFT) methods. This included the use of a language head and three different adapters: LoRA blocks, IA3, and bottleneck adapters.

For the BART models, I introduced a novel approach to the language head to avoid the memory overhead of BART's standard 50,000-token vocabulary. I created a smaller, specialized vocabulary consisting only of the byte-wise tokens needed for the specific dataset, which significantly reduced the dimensions of the language head.

Dataset and Experimental Design

I utilized the SCAN dataset, which consists of 20,000 fully labeled input and target sequence pairs. The dataset is structured so that input sequences combine primitive commands (like "turn left") with modifiers (like "twice"), which translate into sequences of output primitives. My project reproduces three specific experiments from the original study:

Experiment 1 (E1): Testing generalization to random subsets of commands by training the models on varying percentages of the full dataset (from 1% to 64%).
Experiment 2 (E2): Testing generalization to longer action sequences, where the training set contains sequences up to 22 actions, while the test set requires generating sequences up to 48 actions.
Experiment 3 (E3): Testing compositional generalization across specific primitive commands, such as "turn left" and "jump," by varying the amount of exposure the model has to these primitives in complex contexts during training.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dataset		dataset
experiments		experiments
group		group
saved_data		saved_data
transformer		transformer
.gitignore		.gitignore
README.md		README.md
Rapport.pdf		Rapport.pdf
bart.py		bart.py
models_train_individual.ipynb		models_train_individual.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Methodology and Architectures

Dataset and Experimental Design

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Methodology and Architectures

Dataset and Experimental Design

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages