Skip to content

Add GGUF model registry entries for Flare engine #296

@sauravpanda

Description

@sauravpanda

Summary

Add GGUF model entries to BrowserAI's model registry so users can load models via the Flare engine without specifying URLs.

Proposed models

Tier 1 — Small (instant load, <200MB)

Model ID GGUF File Size Notes
smollm2-135m-flare SmolLM2-135M-Instruct Q8_0 138MB Fastest load, good for demos
smollm2-135m-flare-q4 SmolLM2-135M-Instruct Q4_K_M ~75MB Smaller download

Tier 2 — Medium (good quality, <500MB)

Model ID GGUF File Size Notes
smollm2-360m-flare SmolLM2-360M-Instruct Q8_0 ~350MB Better quality
qwen2.5-0.5b-flare Qwen2.5-0.5B-Instruct Q4_K_M ~350MB Multilingual

Tier 3 — Large (best quality, ~1GB)

Model ID GGUF File Size Notes
llama-3.2-1b-flare Llama-3.2-1B-Instruct Q8_0 1.2GB Best quality
llama-3.2-1b-flare-q4 Llama-3.2-1B-Instruct Q4_K_M ~600MB Balanced

Registry format

{
    "smollm2-135m-flare": {
        engine: "flare",
        url: "https://huggingface.co/Qwen/SmolLM2-135M-Instruct-GGUF/resolve/main/smollm2-135m-instruct-q8_0.gguf",
        architecture: "llama",
        contextLength: 2048,
        quantization: "Q8_0",
        size: "138MB",
        features: ["chat", "instruction-following"]
    }
}

Advantage over MLC models

GGUF files are standard and available on HuggingFace without any conversion. Users can also load custom GGUF files not in the registry.

Tasks

  • Find/verify GGUF download URLs on HuggingFace
  • Add registry entries
  • Test each model loads and generates correctly
  • Add model cards to documentation

Depends on

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    flare-integrationFlare WASM inference engine integration

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions