freelm is a free, open-source LLM gateway for Python and Node.js that pools OpenRouter, Google AI Studio (Gemini), NVIDIA NIM, Groq, Cerebras, and Mistral behind a single OpenAI-compatible call — with key rotation, automatic failover, circuit breaking, and live model discovery. Build resilient AI apps without a credit card.
Install in one command
import freelm
# Reads provider keys from environment
llm = freelm.FreeLLM.from_env()
print(llm.text("Explain black holes."))import { FreeLLM } from "freelm";
// Reads provider keys from environment
const llm = FreeLLM.fromEnv();
console.log(await llm.text("Explain black holes."));# from openai import OpenAI ← swap this
from freelm.compat import OpenAI
client = OpenAI()
r = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "hi"}]
)
print(r.choices[0].message.content)// import OpenAI from "openai"; ← swap this
import { OpenAI } from "freelm/compat";
const client = new OpenAI();
const r = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "hi" }]
});
console.log(r.choices[0].message.content);freelm pools all of these behind one call. Set whichever keys you have and it routes automatically. Free tier limits verified June 2026.
~50 req/day
500+ free models via :free
Generous free tier
Gemini Flash & Pro
Free build credits
Llama, Mistral, Mixtral
30 RPM / 14,400 req/day
Ultra-fast inference, no card
~30 RPM / 1M tokens/day
Fast, 8K context cap
2 RPM / 1B tokens/mo
Experiment tier, no card
Pools OpenRouter, Google AI Studio (Gemini), NVIDIA NIM, Groq, Cerebras, and Mistral behind one fault-tolerant client. Each provider's free tier runs concurrently, multiplying your total free quota.
On a rate limit (429), dead key (401/403), or server error (5xx), freelm rotates keys and retries with the next available provider — interleaved across all providers, not one at a time.
No code rewrites. Change one import — `from freelm.compat import OpenAI` (Python) or `import { OpenAI } from 'freelm/compat'` (Node.js) — and keep your existing OpenAI SDK calls unchanged.
A per-key circuit breaker opens after repeated failures and half-opens after cooldown. A token-bucket quota guard skips keys predicted to be exhausted before you waste a request.
Free model IDs change daily. freelm queries each provider's /models API on first use, caches results to disk (~/.cache/freelm), and falls back to hardcoded lists offline — so it always works.
Choose priority (deterministic ordering), round_robin (even load spread), quota_aware (ranks by remaining headroom), or latency (prefers fastest observed provider). Switch per client instance.
freelm is a free, open-source LLM client and gateway for Python (pip install freelm) and Node.js/TypeScript (npm install freelm). It pools six free-tier LLM providers — OpenRouter, Google AI Studio (Gemini), NVIDIA NIM, Groq, Cerebras, and Mistral — behind a single OpenAI-compatible call. It handles API-key rotation, cross-provider failover, circuit breaking, quota-aware routing, and live free-model discovery automatically. You supply whichever free keys you have and freelm keeps your application talking to an LLM even when one provider rate-limits or goes down. It is MIT-licensed and maintained by Shahriar Labs.
Yes. The freelm package itself is MIT-licensed and costs nothing. It routes requests exclusively to the free tiers of supported providers — OpenRouter (:free models), Google AI Studio (free quota), NVIDIA NIM (build credits), Groq (30 RPM / 14,400 req/day), Cerebras (~1M tokens/day), and Mistral (Experiment tier: 2 RPM, 1B tokens/month). None of these providers require a credit card for their free tier. Your only cost is your own compute and bandwidth; freelm does not proxy requests through any paid service. Rate limit numbers are defaults as of June 2026 and may change — freelm lets you override rpm and rpd per provider.
Install freelm from PyPI with: pip install freelm (requires Python 3.8 or later). Then set one or more provider API keys as environment variables — for example OPENROUTER_API_KEY, GEMINI_API_KEY, GROQ_API_KEY — and call freelm.FreeLLM.from_env() to create a client. The client auto-detects which keys are present and builds a pool from them. You can also pass providers explicitly: FreeLLM([OpenRouter('sk-or-...'), GoogleAIStudio('AIza...')]) for full control over which providers are used and in what order. No further configuration is required to start making chat completions.
Install the npm package with: npm install freelm (requires Node.js 18 or later). Import and use it as: import { FreeLLM } from 'freelm'; const llm = FreeLLM.fromEnv(); console.log(await llm.text('Hello')). The Node.js port has zero runtime dependencies — it uses the built-in fetch API. It supports the same API surface as the Python version: chat(), text(), stream(), health(), and the drop-in OpenAI shim via import { OpenAI } from 'freelm/compat'. TypeScript types are bundled. The same environment variables (OPENROUTER_API_KEY, GEMINI_API_KEY, etc.) are used for key loading.
freelm ships a compatibility shim that mirrors the OpenAI Python SDK and JS SDK interfaces exactly. In Python, replace 'from openai import OpenAI' with 'from freelm.compat import OpenAI' and your existing client.chat.completions.create(...) calls will work unchanged — backed by FreeLLM.from_env() instead of the OpenAI API. In Node.js, replace 'import OpenAI from openai' with 'import { OpenAI } from freelm/compat'. Use model='auto' (or any virtual model alias) to let freelm pick the best available free model. No other code changes are required, making migration instantaneous.
When freelm receives a 429 rate-limit response, it cools that key and rotates to the next available key or provider. A 5xx server error or timeout triggers a circuit breaker: that key's breaker opens, requests skip it, and after a cooldown the breaker half-opens to allow one test request through. A 401/403 auth error disables the key permanently for that session. The failover is interleaved across providers — the best model of every provider is tried before any provider's second model — ensuring every provider is reached quickly rather than exhausting one provider's model list first. The max_attempts parameter (default 12) caps total tries per call.
Because model names differ across providers (e.g. 'llama-3.3-70b-instruct:free' on OpenRouter vs 'llama3-70b-8192' on Groq), freelm lets you request by intent rather than exact model ID. The built-in aliases are: 'auto' or 'chat' (any available chat model), 'chat:large' or 'large' (a larger/stronger model), 'chat:fast' or 'fast' (a fast/cheap model), 'chat:small' or 'small' (the smallest model), and any 'vendor/model-id' passthrough for exact control. freelm resolves each alias to a concrete model per provider. Free model IDs change constantly, so it discovers them live via each provider's /models API and caches results to disk.
freelm supports four routing strategies set at client construction: priority (providers tried in ascending priority integer, deterministic), round_robin (rotates which provider goes first each call, spreading load evenly), quota_aware (ranks providers by current remaining quota headroom — keys nearing their daily limit score lower, pushing traffic to less-used providers), and latency (prefers the provider with the lowest observed EWMA latency, measured per call). The default is priority. You switch strategies per FreeLLM instance. Within any strategy, failover is always interleaved across providers so no single provider monopolizes retries.
No credit card. No configuration. Just set free API keys and call freelm.FreeLLM.from_env().