Try LFM • Documentation • LEAP • Blog
ONNX export and inference tools for LFM2 models.
| Family | Quant Formats |
|---|---|
| LFM2.5, LFM2 | fp32, fp16, q4, q8 |
| LFM2.5-VL, LFM2-VL | fp32, fp16, q4, q8 |
| LFM2-MoE | fp32, fp16, q4, q4f16 |
| LFM2.5-Audio | fp32, fp16, q4, q8 |
git clone https://github.com/Liquid4All/onnx-export.git
cd onnx-export
uv sync
# For GPU inference support
uv sync --extra gpu
# For development (testing, benchmarking)
uv sync --extra dev# All precisions
uv run lfm2-export LiquidAI/LFM2.5-1.2B-Instruct --precision# All precisions
uv run lfm2-vl-export LiquidAI/LFM2.5-VL-1.6B --precision
# Conv2d vision format (alternative to default tiled)
uv run lfm2-vl-export LiquidAI/LFM2.5-VL-1.6B --vision-format conv2d# All precisions
uv run lfm2-moe-export LiquidAI/LFM2-MoE-8B-A1B --precisionAll inference commands provide interactive multi-turn chat with streaming output. They automatically detect CUDA availability and fall back to CPU if needed.
# Interactive chat (starts conversation loop)
uv run lfm2-infer --model ./exports/LFM2.5-1.2B-Instruct-ONNX/onnx/model_q4.onnx
# Single prompt (non-interactive)
uv run lfm2-infer --model ./exports/LFM2.5-1.2B-Instruct-ONNX/onnx/model_q4.onnx \
--prompt "Explain quantum computing"
# Force CPU execution
uv run lfm2-infer --model ./exports/LFM2.5-1.2B-Instruct-ONNX/onnx/model_q4.onnx --cpu# Single image analysis
uv run lfm2-vl-infer --model ./exports/LFM2.5-VL-1.6B-ONNX \
--images photo.jpg \
--prompt "What do you see in this image?"
# Multi-image comparison (up to 2 images)
uv run lfm2-vl-infer --model ./exports/LFM2.5-VL-1.6B-ONNX \
--images image1.jpg image2.jpg \
--prompt "Compare these two images"
# Text-only (no images)
uv run lfm2-vl-infer --model ./exports/LFM2.5-VL-1.6B-ONNX \
--prompt "Hello, how are you?"Note: VL inference requires the model directory path (not a single .onnx file) since it loads multiple components:
embed_tokens.onnx,embed_images.onnx, anddecoder.onnx.
# Interactive chat
uv run lfm2-moe-infer --model ./exports/LFM2-MoE-8B-A1B-ONNX/onnx/model_q4.onnx
# Force CPU (when model does not fit VRAM)
uv run lfm2-moe-infer --model ./exports/LFM2-MoE-8B-A1B-ONNX/onnx/model_q4.onnx --cpuLFM2.5-Audio is a multimodal audio-language model supporting three modes:
- ASR (Automatic Speech Recognition): Transcribe audio to text
- TTS (Text-to-Speech): Generate audio from text
- Interleaved: Mixed text and audio input/output for conversational audio
The model uses 5 ONNX components:
decoder.onnx- LFM2 language model backboneaudio_encoder.onnx- Conformer encoder for ASR inputaudio_embedding.onnx- Audio code embeddings for TTS/interleavedaudio_detokenizer.onnx- Converts audio codes to STFT featuresvocoder_depthformer.onnx- Autoregressive audio codebook prediction
# ASR: Transcribe audio to text
uv run lfm2-audio-infer LFM2.5-Audio-1.5B-ONNX --mode asr \
--audio input.wav --precision q4
# TTS: Generate speech from text
uv run lfm2-audio-infer LFM2.5-Audio-1.5B-ONNX --mode tts \
--prompt "Hello, how are you today?" \
--system "Perform TTS. Use the UK female voice." \
--output output.wav --precision q4
# Interleaved: Audio input with text+audio response
uv run lfm2-audio-infer LFM2.5-Audio-1.5B-ONNX --mode interleaved \
--audio question.wav --output response.wav --precision q4
# Interactive chat mode (multi-turn with stateful KV cache)
uv run lfm2-audio-infer LFM2.5-Audio-1.5B-ONNX --mode interleaved --chat \
--output output.wav --precision q4
# Commands in chat mode:
# /audio <file> [text] - Send audio with optional text
# <text> - Send text message
# reset - Clear conversation state
# quit - ExitNote: Audio inference requires the model directory path (not a single .onnx file) since it loads multiple components. Use
--precisionto select quantization level (fp16, q4, q8).
Tests verify ONNX exports against PyTorch reference models.
# Install dev dependencies
uv sync --extra dev
# LFM2 text model tests
uv run pytest tests/test_lfm2/test_decoder.py -v -k "q4"
# LFM2-VL vision-language tests
uv run pytest tests/test_lfm2_vl/test_decoder.py -v -k "450M"
uv run pytest tests/test_lfm2_vl/test_vision_encoder.py -v
# LFM2-MoE tests
uv run pytest tests/test_lfm2_moe/test_decoder.py -vBenchmarking, compare the CPU
# Text model benchmark
uv run lfm2-bench --model LiquidAI/LFM2.5-1.2B-Instruct \
--onnx ./exports/LFM2.5-1.2B-Instruct-ONNX/onnx/model_q4.onnxText models:
- LiquidAI/LFM2.5-1.2B-Base-ONNX
- LiquidAI/LFM2.5-1.2B-Instruct-ONNX
- LiquidAI/LFM2.5-1.2B-JP-ONNX
- LiquidAI/LFM2-2.6B-Transcript-ONNX
Vision-Language:
Audio:
Text models:
- onnx-community/LFM2-350M-ONNX
- onnx-community/LFM2-700M-ONNX
- onnx-community/LFM2-1.2B-ONNX
- onnx-community/LFM2-2.6B-ONNX
- onnx-community/LFM2-2.6B-Exp-ONNX
Specialized:
- onnx-community/LFM2-350M-ENJP-MT-ONNX — translation
- onnx-community/LFM2-350M-Extract-ONNX
- onnx-community/LFM2-350M-Math-ONNX
- onnx-community/LFM2-1.2B-Extract-ONNX
- onnx-community/LFM2-1.2B-RAG-ONNX
- onnx-community/LFM2-1.2B-Tool-ONNX
Vision-Language:
MoE:
Note: The onnx-community models are exported using Transformers.js tooling with a different export pipeline. This project aims to produce compatible graph structures and file naming conventions to ensure interoperability with Transformers.js and other ONNX consumers.
Special thanks to Joshua Lochner for his work on Transformers.js and the onnx-community models, which inspired and informed this project's ONNX export approach.
See LICENSE for details.