freelm Documentation
freelm is a free, always-up LLM client for Python and Node.js/TypeScript. It pools six free-tier providers — OpenRouter, Google AI Studio (Gemini), NVIDIA NIM, Groq, Cerebras, and Mistral — behind one OpenAI-compatible call with automatic key rotation, cross-provider failover, circuit breaking, and live model discovery.
Installation
Install freelm from the package manager for your language. Python 3.8+ and Node.js 18+ are supported. The Node.js package has zero runtime dependencies and uses the built-in fetch API.
Environment Variables
Set whichever keys you have. freelm auto-detects them. Supply multiple keys per provider by comma-separating them — they are round-robined within that provider for additional rate limit headroom.
| Provider | Key Variables | Tier Variable |
|---|---|---|
| OpenRouter | OPENROUTER_API_KEY / FREELM_OPENROUTER_KEYS | FREELM_OPENROUTER_TIER (free | credit) |
| Google AI Studio | GEMINI_API_KEY / GOOGLE_API_KEY / FREELM_GOOGLE_KEYS | FREELM_GOOGLE_TIER (free | tier1) |
| NVIDIA NIM | NVIDIA_API_KEY / NIM_API_KEY / FREELM_NIM_KEYS | FREELM_NIM_TIER (free) |
| Groq | GROQ_API_KEY / FREELM_GROQ_KEYS | FREELM_GROQ_TIER (free) |
| Cerebras | CEREBRAS_API_KEY / FREELM_CEREBRAS_KEYS | FREELM_CEREBRAS_TIER (free) |
| Mistral | MISTRAL_API_KEY / FREELM_MISTRAL_KEYS | FREELM_MISTRAL_TIER (free) |
Quick Start
The fastest path: call from_env() / fromEnv(). It reads all supported keys from your environment and builds the provider pool automatically.
Python
import freelm
llm = freelm.FreeLLM.from_env()
print(llm.text("Explain black holes in one sentence."))Node.js / TypeScript
import { FreeLLM } from "freelm";
const llm = FreeLLM.fromEnv();
console.log(await llm.text("Explain black holes in one sentence."));Explicit provider config (Python)
For full control over which providers are used, their priority, and routing strategy:
from freelm import FreeLLM, OpenRouter, GoogleAIStudio, NIM, Groq, Cerebras
llm = FreeLLM(
providers=[
OpenRouter("sk-or-...", priority=0), # tried first
GoogleAIStudio("AIza...", priority=1),
Groq("gsk_...", priority=2),
Cerebras("csk-...", priority=3),
NIM("nvapi-...", priority=4),
],
strategy="quota_aware", # priority | round_robin | quota_aware | latency
)
resp = llm.chat(
[{"role": "user", "content": "Write a haiku about failover."}],
model="chat:fast",
)
print(resp.text, "via", resp.provider)Explicit provider config (Node.js)
import { FreeLLM, OpenRouter, GoogleAIStudio, Groq, Cerebras, Mistral, NIM } from "freelm";
const llm = new FreeLLM(
[
new OpenRouter("sk-or-..."),
new GoogleAIStudio("AIza..."),
new Groq("gsk_..."),
new Cerebras("csk-..."),
new Mistral("..."),
new NIM("nvapi-..."),
],
{ strategy: "quota_aware" }
);
const r = await llm.chat(
[{ role: "user", content: "Write a haiku about failover." }],
{ model: "chat:fast" }
);
console.log(r.text, "via", r.provider);Streaming
Token streaming works across all providers through the same failover mechanism. freelm fails over between providers before the first token. Once tokens start flowing, it stays on that provider for the rest of the response — no mid-stream switching.
Python (sync)
llm = freelm.FreeLLM.from_env()
for chunk in llm.stream("Write a haiku about failover."):
print(chunk, end="", flush=True)Python (async)
from freelm import AsyncFreeLLM
async with AsyncFreeLLM.from_env() as llm:
async for chunk in llm.astream("Stream me some tokens"):
print(chunk, end="", flush=True)Node.js
for await (const chunk of llm.stream("Stream me some tokens")) {
process.stdout.write(chunk);
}Drop-in OpenAI Shim
Change one import line. Your existing OpenAI SDK code works unchanged — backed by free providers instead of a paid API.
Python
# Before: from openai import OpenAI
# After:
from freelm.compat import OpenAI
client = OpenAI() # backed by FreeLLM.from_env()
r = client.chat.completions.create(
model="auto", # virtual model: picks best available
messages=[{"role": "user", "content": "hi"}],
)
print(r.choices[0].message.content)Node.js / TypeScript
// Before: import OpenAI from "openai";
// After:
import { OpenAI } from "freelm/compat";
const client = new OpenAI(); // backed by FreeLLM.fromEnv()
const r = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "hi" }],
});
console.log(r.choices[0].message.content);Virtual Models
Model names differ across providers. Use intent-based aliases and freelm resolves each to a concrete model. Free model IDs change constantly, so freelm discovers them live from each provider's /models endpoint and caches results to ~/.cache/freelm.
| Alias | Meaning |
|---|---|
| auto / chat | Any available chat model (registry order) |
| chat:large / large | A larger, stronger model |
| chat:fast / fast | A fast, cheap model (e.g. Groq or Cerebras) |
| chat:small / small | The smallest available model |
| vendor/model-id | Exact passthrough — use precisely this model ID |
List currently available free models
Python
from freelm import list_free_models
for m in list_free_models()[:5]:
print(m.id, m.tags, m.ctx)Node.js
import { listFreeModels } from "freelm";
const models = await listFreeModels();
for (const m of models.slice(0, 5)) {
console.log(m.id, m.tags);
}Routing Strategies
Set the strategy param when constructing FreeLLM. The default is priority. Regardless of strategy, failover is always interleaved across providers so no single provider monopolizes retries.
priorityProviders tried in ascending priority integer. Deterministic and predictable.
round_robinRotates which provider goes first each call, spreading load evenly across all keys.
quota_awareRanks by remaining quota headroom. Keys nearing daily limits score lower.
latencyPrefers the provider with the lowest observed EWMA round-trip latency.
Error Handling (Python)
Retryable errors (RateLimited, Transient) are handled internally and only surface inside NoProvidersAvailable when all providers are exhausted. Non-retryable errors (AuthError, caller 4xx) surface immediately as ProviderError.
from freelm import NoProvidersAvailable, ProviderError
try:
resp = llm.chat("hi")
except NoProvidersAvailable as e:
print("All providers exhausted:", e.attempts)
# e.attempts = [(candidate, exception), ...]
except ProviderError as e:
print(e.provider, e.status, e.retryable)Health & Introspection
Call llm.health() at any time to inspect the live state of every key in the pool.
for row in llm.health():
print(row)
# → provider, key (masked), ready, breaker state, rpd_used, last_error, ewma_latency_msEach response object also exposes: r.text, r.provider, r.model, r.usage, r.latency_ms, and r.raw.