Deploy Always-On AI Agents on AWS for ~$17/mo
TL;DR
Run a private always-on AI agent on AWS for about $17/month using free OpenRouter models and Terraform. Here is the exact recipe and cost breakdown.
You can run a private always-on AI agent on AWS for about $17 per month using free OpenRouter models and Terraform. This is the same recipe behind hermes-agent-aws — accessible via Telegram or SSH, always reachable, no GPU bill. Here is the exact build.
What you get
- An AWS-hosted agent reachable from Telegram and SSH
- LLM inference via free OpenRouter models (DeepSeek R1, Qwen 3, Gemma 3, Llama 3.1)
- Persistent memory via common-knowledge
- Terraform-managed infrastructure — one
terraform applyfrom zero to live
The cost breakdown
| Item | Cost / month |
|---|---|
| t4g.small (ARM, 2 vCPU, 2 GB RAM) | ~$11 |
| 30 GB EBS gp3 | ~$2.50 |
| Public IPv4 + minor egress | ~$3.50 |
| LLM inference | $0 (free OpenRouter tier) |
| Total | ~$17 |
The trick is the free OpenRouter tier via openrouter-free-infer. Cheap calls go to free models; premium calls would push the bill higher but are not required for most agent tasks.
The recipe
1. Provision
terraform apply
This provisions an ARM-based EC2 instance, a security group allowing Telegram webhook traffic and SSH, an EBS volume, and a static IP.
2. Install
The user-data script installs:
- Docker
- The agent binary (Go, single static binary)
- openrouter-free-infer for LLM routing
- common-knowledge for memory
3. Configure
hermes init
hermes telegram --token=<your_bot_token>
hermes start
4. Test
Send a message to the bot from Telegram. The agent picks it up, routes through OpenRouter, returns the response.
Why ARM (Graviton)
ARM instances (t4g family) are ~20% cheaper than equivalent x86 (t3) and run Go binaries natively. Memory footprint of a Go agent is ~30 MB; 2 GB is plenty.
Why free OpenRouter tier
DeepSeek R1, Qwen 3, Gemma 3, Llama 3.1 are all available free via OpenRouter. They handle 80% of agent tasks well. openrouter-free-infer routes between them with fallback.
For sensitive tasks where you want premium quality, swap in a paid model. Cost goes up; everything else is the same.
What this is good for
- Personal AI assistant accessible from anywhere
- Background research agent that polls and reports
- Side projects where you want an always-on LLM without a GPU bill
What it isn't
- A scalable production service (one instance, no auto-scale)
- Suitable for latency-critical workloads (free OpenRouter models can be slow)
- A replacement for hosted offerings if you need 99.99% uptime
I covered the broader skills layer in My AI Agent Skills Stack.
FAQ
Q: Can I run multiple agents on one instance? A: Yes — they share the LLM router and memory layer.
Q: How do I add new capabilities? A: Drop a new skill into the agent. The skills layer is open-source and pluggable.
Q: Does it work without OpenRouter? A: Yes — swap the router for any OpenAI-compatible endpoint. Costs change accordingly.
Written by Shihab Shahriar Antor. See my projects or hire me at Shahriar Labs.
Written by
Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. Creator of LetX, QuantumSketch, and more.