A lightweight FastAPI service that proxies NVIDIA's free preview models endpoint to an OpenAI-compatible /v1/models
API.
Designed for use with Bifrost or any OpenAI-compatible client.
- Docker & Docker Compose
# Create env file
cp env.example .env
# Build the service
docker compose build
# Start the service
docker compose up -d
The API will be available at http://localhost:8101 (or your custom port).
# Install dependencies
poetry install
# Run with auto-reload
poetry run uvicorn src.service.api.app:app --host 0.0.0.0 --port 8101 --reloadSettings are managed via settings.toml and environment variables:
| Variable | Description | Default |
|---|---|---|
APP_PORT |
Host port for the service | 8101 |
free_models_url |
NVIDIA models page URL | https://build.nvidia.com/models?filters=nimType%3Anim_type_preview&pageSize=96 |
cache_ttl_minutes |
Cache duration for model list | 30 |
fetch_timeout_seconds |
HTTP request timeout | 30 |
user_agent |
User-Agent header for requests | Chrome-based UA |
Returns a list of available free preview models in OpenAI format.
Example response:
{
"object": "list",
"data": [
{
"id": "nvidia/cosmos3-nano",
"object": "model",
"created": 1780395340,
"owned_by": "nvidia"
}
]
}Health check endpoint for Docker Compose.
Response:
{
"status": "ok"
}src/
service/api/ # FastAPI application
cases/nvidia_models/ # Scraping logic
data/logs/ # Application logs
compose.yml # Docker Compose configuration
settings.toml # Default settings
pyproject.toml # Python dependencies
- The scraper relies on HTML structure (
data-nvtrack-nav-object="artifact-card"). If NVIDIA changes their page layout, the parser may need updates. - For production use, prefer an official NVIDIA API if one becomes available.
- The
owned_byfield is derived from the model ID prefix (e.g.,meta/llama-3.1→owned_by: "meta").
MIT