A lightweight proxy server that translates Claude Code's Anthropic API calls into NVIDIA NIM, OpenRouter, or LM Studio format. Get 40 free requests/min on NVIDIA NIM, access hundreds of models on OpenRouter, or run fully local with LM Studio.
Features · Quick Start · How It Works · Discord Bot · Configuration
| Feature | Description |
|---|---|
| Zero Cost | 40 req/min free on NVIDIA NIM. Free models on OpenRouter. Fully local with LM Studio |
| Drop-in Replacement | Set 2 env vars — no modifications to Claude Code CLI or VSCode extension needed |
| 3 Providers | NVIDIA NIM, OpenRouter (hundreds of models), LM Studio (local & offline) |
| Thinking Token Support | Parses <think> tags and reasoning_content into native Claude thinking blocks |
| Heuristic Tool Parser | Models outputting tool calls as text are auto-parsed into structured tool use |
| Request Optimization | 5 categories of trivial API calls intercepted locally — saves quota and latency |
| Discord Bot | Remote autonomous coding with tree-based threading, session persistence, and live progress (Telegram also supported) |
| Smart Rate Limiting | Proactive rolling-window throttle + reactive 429 exponential backoff across all providers |
| Subagent Control | Task tool interception forces run_in_background=False — no runaway subagents |
| Extensible | Clean BaseProvider and MessagingPlatform ABCs — add new providers or platforms easily |
- Get an API key (or use LM Studio locally):
- NVIDIA NIM: build.nvidia.com/settings/api-keys
- OpenRouter: openrouter.ai/keys
- LM Studio: No API key needed — run locally with LM Studio
- Install Claude Code
- Install uv
git clone https://github.com/Alishahryar1/free-claude-code.git
cd free-claude-code
cp .env.example .envChoose your provider and edit .env:
NVIDIA NIM (recommended — 40 req/min free)
PROVIDER_TYPE=nvidia_nim
NVIDIA_NIM_API_KEY=nvapi-your-key-here
MODEL=stepfun-ai/step-3.5-flashOpenRouter (hundreds of models)
PROVIDER_TYPE=open_router
OPENROUTER_API_KEY=sk-or-your-key-here
MODEL=stepfun/step-3.5-flash:freeLM Studio (fully local, no API key)
PROVIDER_TYPE=lmstudio
MODEL=lmstudio-community/qwen2.5-7b-instructTerminal 1 — Start the proxy server:
uv run uvicorn server:app --host 0.0.0.0 --port 8082Terminal 2 — Run Claude Code:
ANTHROPIC_AUTH_TOKEN=freecc ANTHROPIC_BASE_URL=http://localhost:8082 claudeThat's it! Claude Code now uses your configured provider for free.
VSCode Extension Setup
- Start the proxy server (same as above).
- Open Settings (
Ctrl + ,) and search forclaude-code.environmentVariables. - Click Edit in settings.json and add:
"claude-code.environmentVariables": [
{ "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
{ "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
]- Reload extensions.
- If you see the login screen ("How do you want to log in?"): Click Anthropic Console, then authorize. The extension will start working. You may be redirected to buy credits in the browser — ignore that; the extension already works.
To switch back to Anthropic models, comment out the added block and reload extensions.
┌─────────────────┐ ┌──────────────────────┐ ┌──────────────────┐
│ Claude Code │───────>│ Free Claude Code │───────>│ LLM Provider │
│ CLI / VSCode │<───────│ Proxy (:8082) │<───────│ NIM / OR / LMS │
└─────────────────┘ └──────────────────────┘ └──────────────────┘
Anthropic API │ OpenAI-compatible
format (SSE) ┌───────┴────────┐ format (SSE)
│ Optimizations │
├────────────────┤
│ Quota probes │
│ Title gen skip │
│ Prefix detect │
│ Suggestion skip│
│ Filepath mock │
└────────────────┘
- Transparent proxy — Claude Code sends standard Anthropic API requests to the proxy server
- Request optimization — 5 categories of trivial requests (quota probes, title generation, prefix detection, suggestions, filepath extraction) are intercepted and responded to instantly without using API quota
- Format translation — Real requests are translated from Anthropic format to the provider's OpenAI-compatible format and streamed back
- Thinking tokens —
<think>tags andreasoning_contentfields are converted into native Claude thinking blocks so Claude Code renders them correctly
| Provider | Cost | Rate Limit | Models | Best For |
|---|---|---|---|---|
| NVIDIA NIM | Free | 40 req/min | Kimi K2, GLM5, Devstral, MiniMax | Daily driver — generous free tier |
| OpenRouter | Free / Pay | Varies | 200+ (GPT-4o, Claude, Step, etc.) | Model variety, fallback options |
| LM Studio | Free (local) | Unlimited | Any GGUF model | Privacy, offline use, no rate limits |
Switch providers by changing PROVIDER_TYPE in .env:
| Provider | PROVIDER_TYPE |
API Key Variable | Base URL |
|---|---|---|---|
| NVIDIA NIM | nvidia_nim |
NVIDIA_NIM_API_KEY |
integrate.api.nvidia.com/v1 |
| OpenRouter | open_router |
OPENROUTER_API_KEY |
openrouter.ai/api/v1 |
| LM Studio | lmstudio |
(none) | localhost:1234/v1 |
OpenRouter gives access to hundreds of models (StepFun, OpenAI, Anthropic, etc.) through a single API. Set MODEL to any OpenRouter model ID.
LM Studio runs locally — start the server in LM Studio's Developer tab or via lms server start, load a model, and set MODEL to the model identifier.
Control Claude Code remotely from Discord. Send tasks, watch live progress, and manage multiple concurrent sessions. Discord is the default messaging platform; Telegram is also supported.
Capabilities:
- Tree-based message threading — reply to messages to fork conversations
- Session persistence across server restarts
- Live streaming of thinking tokens, tool calls, and results
- Up to 10 concurrent Claude CLI sessions
- Commands:
/stop(cancel tasks),/clear(reset all sessions),/stats
-
Create a Discord Bot — Go to Discord Developer Portal, create an application, add a bot, and copy the token. Enable Message Content Intent under Bot settings.
-
Edit
.env:
MESSAGING_PLATFORM=discord
DISCORD_BOT_TOKEN=your_discord_bot_token
ALLOWED_DISCORD_CHANNELS=123456789,987654321Enable Developer Mode in Discord (Settings → Advanced), then right-click a channel and "Copy ID" to get channel IDs. Comma-separate multiple channels. If empty, no channels are allowed.
- Configure the workspace (where Claude will operate):
CLAUDE_WORKSPACE=./agent_workspace
ALLOWED_DIR=C:/Users/yourname/projects- Start the server:
uv run uvicorn server:app --host 0.0.0.0 --port 8082- Invite the bot to your server (OAuth2 → URL Generator, scopes:
bot, permissions: Read Messages, Send Messages, Manage Messages, Read Message History). Send a message in an allowed channel with a task. Claude responds with thinking tokens, tool calls as they execute, and the final result. Reply/stopto a running task to cancel it.
To use Telegram instead, set MESSAGING_PLATFORM=telegram and configure:
TELEGRAM_BOT_TOKEN=123456789:ABCdefGHIjklMNOpqrSTUvwxYZ
ALLOWED_TELEGRAM_USER_ID=your_telegram_user_idGet a token from @BotFather; find your user ID via @userinfobot.
NVIDIA NIM
Full list in nvidia_nim_models.json.
Popular models:
z-ai/glm5stepfun-ai/step-3.5-flashmoonshotai/kimi-k2.5minimaxai/minimax-m2.1mistralai/devstral-2-123b-instruct-2512
Browse: build.nvidia.com
Update model list:
curl "https://integrate.api.nvidia.com/v1/models" > nvidia_nim_models.jsonOpenRouter
Hundreds of models from StepFun, OpenAI, Anthropic, Google, and more.
Popular models:
stepfun/step-3.5-flash:freedeepseek/deepseek-r1-0528:freeopenai/gpt-oss-120b:free
Browse: openrouter.ai/models
Browse free models: https://openrouter.ai/collections/free-models
LM Studio
Run models locally with LM Studio. Load a model in the Chat or Developer tab, then set MODEL to its identifier.
Examples (native tool-use support):
lmstudio-community/qwen2.5-7b-instructlmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUFbartowski/Ministral-8B-Instruct-2410-GGUF
Browse: model.lmstudio.ai
| Variable | Description | Default |
|---|---|---|
PROVIDER_TYPE |
Provider: nvidia_nim, open_router, or lmstudio |
nvidia_nim |
MODEL |
Model to use for all requests | stepfun-ai/step-3.5-flash |
NVIDIA_NIM_API_KEY |
NVIDIA API key (NIM provider) | required |
OPENROUTER_API_KEY |
OpenRouter API key (OpenRouter provider) | required |
LM_STUDIO_BASE_URL |
LM Studio server URL | http://localhost:1234/v1 |
PROVIDER_RATE_LIMIT |
LLM API requests per window | 40 |
PROVIDER_RATE_WINDOW |
Rate limit window (seconds) | 60 |
HTTP_READ_TIMEOUT |
Read timeout for provider API requests (seconds) | 300 |
HTTP_WRITE_TIMEOUT |
Write timeout for provider API requests (seconds) | 10 |
HTTP_CONNECT_TIMEOUT |
Connect timeout for provider API requests (seconds) | 2 |
FAST_PREFIX_DETECTION |
Enable fast prefix detection | true |
ENABLE_NETWORK_PROBE_MOCK |
Enable network probe mock | true |
ENABLE_TITLE_GENERATION_SKIP |
Skip title generation | true |
ENABLE_SUGGESTION_MODE_SKIP |
Skip suggestion mode | true |
ENABLE_FILEPATH_EXTRACTION_MOCK |
Enable filepath extraction mock | true |
MESSAGING_PLATFORM |
Messaging platform: discord or telegram |
discord |
DISCORD_BOT_TOKEN |
Discord Bot Token | "" |
ALLOWED_DISCORD_CHANNELS |
Comma-separated channel IDs (empty = none allowed) | "" |
TELEGRAM_BOT_TOKEN |
Telegram Bot Token | "" |
ALLOWED_TELEGRAM_USER_ID |
Allowed Telegram User ID | "" |
MESSAGING_RATE_LIMIT |
Messaging messages per window | 1 |
MESSAGING_RATE_WINDOW |
Messaging window (seconds) | 1 |
CLAUDE_WORKSPACE |
Directory for agent workspace | ./agent_workspace |
ALLOWED_DIR |
Allowed directories for agent | "" |
MAX_CLI_SESSIONS |
Max concurrent CLI sessions | 10 |
See .env.example for all supported parameters.
free-claude-code/
├── server.py # Entry point
├── api/ # FastAPI routes, request detection, optimization handlers
├── providers/ # BaseProvider ABC + NVIDIA NIM, OpenRouter, LM Studio
├── messaging/ # MessagingPlatform ABC + Discord/Telegram bots, session management
├── config/ # Settings, NIM config, logging
├── cli/ # CLI session and process management
├── utils/ # Text utilities
└── tests/ # Pytest test suite
uv run pytest # Run tests
uv run ty check # Type checking
uv run ruff check # Code style checking
uv run ruff format # Code formattingExtend BaseProvider in providers/ to add support for other APIs:
from providers.base import BaseProvider, ProviderConfig
class MyProvider(BaseProvider):
async def stream_response(self, request, input_tokens=0, *, request_id=None):
# Yield Anthropic SSE format events
...Extend MessagingPlatform in messaging/ to add Slack or other platforms:
from messaging.base import MessagingPlatform
class MyPlatform(MessagingPlatform):
async def start(self):
# Initialize connection
...
async def stop(self):
# Cleanup
...
async def send_message(self, chat_id, text, reply_to=None, parse_mode=None):
# Send a message
...
async def edit_message(self, chat_id, message_id, text, parse_mode=None):
# Edit an existing message
...
def on_message(self, handler):
# Register callback for incoming messages
...Contributions are welcome! Here are some ways to help:
- Report bugs or suggest features via Issues
- Add new LLM providers (Groq, Together AI, etc.)
- Add new messaging platforms (Slack, etc.)
- Improve test coverage
# Fork the repo, then:
git checkout -b my-feature
# Make your changes
uv run pytest && uv run ty check && uv run ruff check && uv run ruff format --check
# Open a pull requestThis project is licensed under the MIT License — see the LICENSE file for details.
Built with FastAPI, OpenAI Python SDK, discord.py, and python-telegram-bot.
