Overview

Use Claude Code CLI and VSCode for free by routing Anthropic API calls to free or low-cost LLM providers.

Overview

Use Claude Code CLI and VSCode for free. No Anthropic API key required.

free-claude-code is a lightweight proxy that intercepts Claude Code’s API calls and routes them to alternative LLM providers. You keep the same Claude Code interface, commands, and workflow—just with different backends that cost less or nothing at all.

How it works

┌─────────────────┐        ┌──────────────────────┐        ┌──────────────────┐
│  Claude Code    │───────>│  free-claude-code    │───────>│  LLM Provider    │
│  CLI / VSCode   │<───────│  Proxy (:8082)       │<───────│  NIM / OR / LMS  │
└─────────────────┘        └──────────────────────┘        └──────────────────┘
   Anthropic API                                             Native Anthropic
   format (SSE)                                             or OpenAI chat SSE

Claude Code sends standard Anthropic API requests. The proxy forwards them to your configured provider, translates responses back into Anthropic format, and streams results to Claude Code in real time.

What you can do with it

Run Claude Code for free: NVIDIA NIM gives you 40 requests per minute at no cost. OpenRouter has hundreds of free models. LM Studio and llama.cpp run locally with zero API costs.
Mix providers per model: Route Opus requests to one provider, Sonnet to another, Haiku to a third. Use the best model for each task without switching tools.
Use local models for privacy: Run everything offline with LM Studio or llama.cpp—no data leaves your machine.
Control costs explicitly: Set rate limits, concurrency caps, and per-provider timeouts to avoid surprise bills.
Deploy as a Discord or Telegram bot: Remote autonomous coding with tree-based threading, session persistence, and live progress streaming.

Supported providers

Provider	Cost	Rate Limit	Best For
NVIDIA NIM	Free	40 req/min	Daily driver, generous free tier
OpenRouter	Free / Paid	Varies	Model variety, fallback options
DeepSeek	Usage-based	Varies	Direct access to DeepSeek chat/reasoner
LM Studio	Free (local)	Unlimited	Privacy, offline use, no rate limits
llama.cpp	Free (local)	Unlimited	Lightweight local inference engine

Key features

Feature	What it does
Zero Cost	40 req/min free on NVIDIA NIM. Free models on OpenRouter. Fully local with LM Studio.
Drop-in Replacement	Set 2 environment variables. No modifications to Claude Code CLI or VSCode extension needed.
Per-Model Mapping	Route Opus / Sonnet / Haiku to different models and providers. Mix freely.
Thinking Token Support	Parses `<thinking>` tags and `reasoning_content` into native Claude thinking blocks.
Request Optimization	5 categories of trivial API calls intercepted locally, saving quota and latency.
Smart Rate Limiting	Proactive rolling-window throttle plus reactive 429 exponential backoff.
Voice Notes	Send voice messages on Discord or Telegram; transcribed and processed as prompts.

Start here

If you are new to free-claude-code, start with the shortest useful path:

Quick start: Get the proxy running with NVIDIA NIM in under 5 minutes.
Install guide: Set up uv, clone the repo, and configure your first provider.
Provider guides: Learn the specific setup for NVIDIA NIM, OpenRouter, or local models.
Environment variables: Reference for all configuration options.
Troubleshooting: Fix common connection, authentication, and model mapping issues.