freelm Documentation

freelm is a free, always-up LLM client for Python and Node.js/TypeScript. It pools six free-tier providers — OpenRouter, Google AI Studio (Gemini), NVIDIA NIM, Groq, Cerebras, and Mistral — behind one OpenAI-compatible call with automatic key rotation, cross-provider failover, circuit breaking, and live model discovery.


Installation

Install freelm from the package manager for your language. Python 3.8+ and Node.js 18+ are supported. The Node.js package has zero runtime dependencies and uses the built-in fetch API.

Python (3.8+)

bash
pip install freelm
View on PyPI

Node.js / TypeScript (18+)

bash
npm install freelm
View on npm

Environment Variables

Set whichever keys you have. freelm auto-detects them. Supply multiple keys per provider by comma-separating them — they are round-robined within that provider for additional rate limit headroom.

ProviderKey VariablesTier Variable
OpenRouterOPENROUTER_API_KEY / FREELM_OPENROUTER_KEYSFREELM_OPENROUTER_TIER (free | credit)
Google AI StudioGEMINI_API_KEY / GOOGLE_API_KEY / FREELM_GOOGLE_KEYSFREELM_GOOGLE_TIER (free | tier1)
NVIDIA NIMNVIDIA_API_KEY / NIM_API_KEY / FREELM_NIM_KEYSFREELM_NIM_TIER (free)
GroqGROQ_API_KEY / FREELM_GROQ_KEYSFREELM_GROQ_TIER (free)
CerebrasCEREBRAS_API_KEY / FREELM_CEREBRAS_KEYSFREELM_CEREBRAS_TIER (free)
MistralMISTRAL_API_KEY / FREELM_MISTRAL_KEYSFREELM_MISTRAL_TIER (free)

Quick Start

The fastest path: call from_env() / fromEnv(). It reads all supported keys from your environment and builds the provider pool automatically.

Python

Python
import freelm

llm = freelm.FreeLLM.from_env()
print(llm.text("Explain black holes in one sentence."))

Node.js / TypeScript

TypeScript
import { FreeLLM } from "freelm";

const llm = FreeLLM.fromEnv();
console.log(await llm.text("Explain black holes in one sentence."));

Explicit provider config (Python)

For full control over which providers are used, their priority, and routing strategy:

Python
from freelm import FreeLLM, OpenRouter, GoogleAIStudio, NIM, Groq, Cerebras

llm = FreeLLM(
    providers=[
        OpenRouter("sk-or-...", priority=0),       # tried first
        GoogleAIStudio("AIza...", priority=1),
        Groq("gsk_...", priority=2),
        Cerebras("csk-...", priority=3),
        NIM("nvapi-...", priority=4),
    ],
    strategy="quota_aware",   # priority | round_robin | quota_aware | latency
)

resp = llm.chat(
    [{"role": "user", "content": "Write a haiku about failover."}],
    model="chat:fast",
)
print(resp.text, "via", resp.provider)

Explicit provider config (Node.js)

TypeScript
import { FreeLLM, OpenRouter, GoogleAIStudio, Groq, Cerebras, Mistral, NIM } from "freelm";

const llm = new FreeLLM(
  [
    new OpenRouter("sk-or-..."),
    new GoogleAIStudio("AIza..."),
    new Groq("gsk_..."),
    new Cerebras("csk-..."),
    new Mistral("..."),
    new NIM("nvapi-..."),
  ],
  { strategy: "quota_aware" }
);

const r = await llm.chat(
  [{ role: "user", content: "Write a haiku about failover." }],
  { model: "chat:fast" }
);
console.log(r.text, "via", r.provider);

Streaming

Token streaming works across all providers through the same failover mechanism. freelm fails over between providers before the first token. Once tokens start flowing, it stays on that provider for the rest of the response — no mid-stream switching.

Python (sync)

Python
llm = freelm.FreeLLM.from_env()
for chunk in llm.stream("Write a haiku about failover."):
    print(chunk, end="", flush=True)

Python (async)

Python
from freelm import AsyncFreeLLM

async with AsyncFreeLLM.from_env() as llm:
    async for chunk in llm.astream("Stream me some tokens"):
        print(chunk, end="", flush=True)

Node.js

TypeScript
for await (const chunk of llm.stream("Stream me some tokens")) {
  process.stdout.write(chunk);
}

Drop-in OpenAI Shim

Change one import line. Your existing OpenAI SDK code works unchanged — backed by free providers instead of a paid API.

Python

Python
# Before:  from openai import OpenAI
# After:
from freelm.compat import OpenAI

client = OpenAI()   # backed by FreeLLM.from_env()
r = client.chat.completions.create(
    model="auto",   # virtual model: picks best available
    messages=[{"role": "user", "content": "hi"}],
)
print(r.choices[0].message.content)

Node.js / TypeScript

TypeScript
// Before:  import OpenAI from "openai";
// After:
import { OpenAI } from "freelm/compat";

const client = new OpenAI();   // backed by FreeLLM.fromEnv()
const r = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "hi" }],
});
console.log(r.choices[0].message.content);

Virtual Models

Model names differ across providers. Use intent-based aliases and freelm resolves each to a concrete model. Free model IDs change constantly, so freelm discovers them live from each provider's /models endpoint and caches results to ~/.cache/freelm.

AliasMeaning
auto / chatAny available chat model (registry order)
chat:large / largeA larger, stronger model
chat:fast / fastA fast, cheap model (e.g. Groq or Cerebras)
chat:small / smallThe smallest available model
vendor/model-idExact passthrough — use precisely this model ID

List currently available free models

Python

from freelm import list_free_models

for m in list_free_models()[:5]:
    print(m.id, m.tags, m.ctx)

Node.js

import { listFreeModels } from "freelm";

const models = await listFreeModels();
for (const m of models.slice(0, 5)) {
    console.log(m.id, m.tags);
}

Routing Strategies

Set the strategy param when constructing FreeLLM. The default is priority. Regardless of strategy, failover is always interleaved across providers so no single provider monopolizes retries.

priority

Providers tried in ascending priority integer. Deterministic and predictable.

round_robin

Rotates which provider goes first each call, spreading load evenly across all keys.

quota_aware

Ranks by remaining quota headroom. Keys nearing daily limits score lower.

latency

Prefers the provider with the lowest observed EWMA round-trip latency.

Error Handling (Python)

Retryable errors (RateLimited, Transient) are handled internally and only surface inside NoProvidersAvailable when all providers are exhausted. Non-retryable errors (AuthError, caller 4xx) surface immediately as ProviderError.

Python
from freelm import NoProvidersAvailable, ProviderError

try:
    resp = llm.chat("hi")
except NoProvidersAvailable as e:
    print("All providers exhausted:", e.attempts)
    # e.attempts = [(candidate, exception), ...]
except ProviderError as e:
    print(e.provider, e.status, e.retryable)

Health & Introspection

Call llm.health() at any time to inspect the live state of every key in the pool.

Python
for row in llm.health():
    print(row)
# → provider, key (masked), ready, breaker state, rpd_used, last_error, ewma_latency_ms

Each response object also exposes: r.text, r.provider, r.model, r.usage, r.latency_ms, and r.raw.

GitHub Repository PyPI Package npm Package