Skip to content

🔮 Project: Large Language Model Efficiency Challenge #3

@MotzWanted

Description

@MotzWanted

WHY
The costs of accessing, fine-tuning and querying foundation models to perform new tasks are large. Given these costs, access to performant LLMs has been gated behind expensive and often proprietary hardware used to train models, making them inaccessible to those without substantial resources. This project aims to explore the latest innovation in the way LLMs are adapted for specific tasks, considering constraints of GPU resources, while maintaining performance quality.

HOW
The challenge is set with specific constraints and an ambitious goal:

  • Constraint: Adapt a foundation model to specific tasks by fine-tuning on a single GPU (A100) within a 24-hour (1-day) time frame.
  • Goal: Maintain high accuracy for the desired tasks.

Techniques to be explored and analyzed include:

  1. Low-Rank Adaptation (LoRA):
    Designing adapters as the product of two low-rank matrices.
    Building on insights showing that pre-trained language models can learn efficiently in a smaller subspace.

  2. QLoRA:
    Building on LoRA with a 4-bit quantized model.
    Innovations include 4-bit NormalFloat, double quantization, and paged optimizers.

  3. Lightning/FlashAttention/DeepSpeed/FairScale:
    Utilizing external tools/plugins to enhance data usage, training efficiency, and model quality.

  4. Advanced topic - Blackbox LoRA:
    Current optimization methods rely on backpropagating through the whole model. Blackbox optimization consists of optimizing this small set of weights without backprop. Contact Valentin for more details about the theory.

WHAT
The results of this project will lead to:

  • Insights and Lessons: A distilled set of well-documented steps and easy-to-follow tutorials that encapsulate the learnings from the challenge.
  • Innovation in Efficiency: Uncovering new techniques and methods that can significantly impact the way VOD trained models are adapted and fine-tuned.

References
Low-Rank Adaptation (LoRA)
QLoRA
FlashAttention
Lightning Fabric
DeepSpeed
FairScale
NeurIPS LLM Efficiency Challenge

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions