End-of-semester project for the course Laboratory of Computational Physics Mod. B in the academic year 2023/2024. Contributors:
- Muhammad Usama Qasim
- Maryam Gholami Shiri
- Laura Schulze
- Savina Tsichli
The project consists of the following files and folders:
- Next_Token_final.ipynb Notebook for analysis of the next generated token and the system's entropies.
- Token_distances_final.ipynb Notebook for analysis of token embedding distances.
- DistFuncs.py File with helper functions for the token distance analysis.
- prompts Folder containing prompt text files.
- Principal_Component_Analysis Performs 3D PCA on GPT-2 hidden states from large datasets to visualize and analyze model representations.
- Intrinsic_dimension.ipynb Notebook for analysing the intrinsic dimension of intermediate token representations.
- Transformer Presentation Powerpoint presentation of the project.