🔮 Project: Large Language Model Efficiency Challenge

**WHY**
The costs of accessing, fine-tuning and querying foundation models to perform new tasks are large. Given these costs, access to performant LLMs has been gated behind expensive and often proprietary hardware used to train models, making them inaccessible to those without substantial resources. This project aims to explore the latest innovation in the way LLMs are adapted for specific tasks, considering constraints of GPU resources, while maintaining performance quality. 

**HOW**
The challenge is set with specific constraints and an ambitious goal:

- Constraint: Adapt a foundation model to specific tasks by fine-tuning on a single GPU (A100) within a 24-hour (1-day) time frame.
- Goal: Maintain high accuracy for the desired tasks.

Techniques to be explored and analyzed include:

1. _Low-Rank Adaptation (LoRA):_
Designing adapters as the product of two low-rank matrices.
Building on insights showing that pre-trained language models can learn efficiently in a smaller subspace.

2. _QLoRA:_
Building on LoRA with a 4-bit quantized model.
Innovations include 4-bit NormalFloat, double quantization, and paged optimizers.

3. _Lightning/FlashAttention/DeepSpeed/FairScale:_
Utilizing external tools/plugins to enhance data usage, training efficiency, and model quality.

3. _Advanced topic - Blackbox LoRA:_
Current optimization methods rely on backpropagating through the whole model. Blackbox optimization consists of optimizing this small set of weights without backprop. Contact Valentin for more details about the theory. 

**WHAT**
The results of this project will lead to:

- _Insights and Lessons_: A distilled set of well-documented steps and easy-to-follow tutorials that encapsulate the learnings from the challenge.
- _Innovation in Efficiency_: Uncovering new techniques and methods that can significantly impact the way VOD trained models are adapted and fine-tuned.

**References**
[Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685)
[QLoRA](https://arxiv.org/abs/2305.14314)
[FlashAttention](https://github.com/Dao-AILab/flash-attention)
[Lightning Fabric](https://lightning.ai/docs/fabric/stable/)
[DeepSpeed](https://github.com/microsoft/DeepSpeed)
[FairScale](https://github.com/facebookresearch/fairscale)
[NeurIPS LLM Efficiency Challenge](https://llm-efficiency-challenge.github.io/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔮 Project: Large Language Model Efficiency Challenge #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🔮 Project: Large Language Model Efficiency Challenge #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions