Private AI image generation that runs entirely on your hardware. No cloud. No logs. No one watching.
Every major AI image service logs your prompts, filters your outputs, and owns your data. Inferno doesn't. It runs locally on your GPU — your prompts never leave your machine.
- Fully local — models run on your Metal/CUDA GPU, nothing is sent to any server
- No content filters — generate whatever you want, no corporate policy gatekeeping
- Model marketplace — subscribe to access curated LoRA packs and fine-tuned models, downloaded and run locally
- One-click setup — download models from HuggingFace or point to your own weights
- macOS (Metal) or Linux (CUDA)
- Rust toolchain (
rustup,cargo) - ~5GB disk space per model (FP16 weights)
cargo runOn first launch:
- Click image on the home screen
- Go to the models available to ingest tab
- Click a model and choose download from HuggingFace or provide local path
- Configure generation parameters and start creating
Config is stored at ~/.config/inferno/config.toml. Model weights go to ~/.local/share/inferno/models/.
inferno
├── inferno-core # model registry, backend thread, config, ingest, GPU abstraction
├── inferno-app # iced application, views, message handling
└── inferno-gui # theme system, reusable UI components
- Backend thread — long-lived thread owning the GPU device and loaded model, communicates with the UI via channels
- Model registry —
ModelIdenum as source of truth for all known models, with metadata (defaults, HF repo, required files) - Config system — TOML-based config tracking installed models and user-tweaked parameters
- Ingest pipeline — register local weights or download from HuggingFace Hub
| Model | Type | Weights | Status |
|---|---|---|---|
| SDXL-Turbo | Local | FP16 (~4.3GB) | Working — 1-4 step generation via Metal/CUDA |
| Flux | Local | — | Planned |
| DALL-E | Remote API | — | Planned |
SDXL-Turbo components: CLIP text encoder 1, CLIP text encoder 2, UNet, VAE, plus tokenizers (auto-downloaded on first run).
- Rust + iced 0.13 (native GPU-accelerated GUI, tokio async runtime)
- candle (ML inference — safetensors loading, Metal/CUDA device abstraction)
- hf-hub (model downloads from HuggingFace Hub)
- tokenizers (HuggingFace tokenizers for CLIP text encoding)
- image (PNG encoding of generated outputs)
- serde + toml (config persistence)