You can run a private always-on AI agent on AWS for about $17 per month using free OpenRouter models and Terraform. This is the same recipe behind hermes-agent-aws — accessible via Telegram or SSH, always reachable, no GPU bill. Here is the exact build.

What you get

An AWS-hosted agent reachable from Telegram and SSH
LLM inference via free OpenRouter models (DeepSeek R1, Qwen 3, Gemma 3, Llama 3.1)
Persistent memory via common-knowledge
Terraform-managed infrastructure — one terraform apply from zero to live

The cost breakdown

Item	Cost / month
t4g.small (ARM, 2 vCPU, 2 GB RAM)	~$11
30 GB EBS gp3	~$2.50
Public IPv4 + minor egress	~$3.50
LLM inference	$0 (free OpenRouter tier)
Total	~$17

The trick is the free OpenRouter tier via openrouter-free-infer. Cheap calls go to free models; premium calls would push the bill higher but are not required for most agent tasks.

The recipe

1. Provision

terraform apply

This provisions an ARM-based EC2 instance, a security group allowing Telegram webhook traffic and SSH, an EBS volume, and a static IP.

2. Install

The user-data script installs:

Docker
The agent binary (Go, single static binary)
openrouter-free-infer for LLM routing
common-knowledge for memory

3. Configure

hermes init
hermes telegram --token=<your_bot_token>
hermes start

4. Test

Send a message to the bot from Telegram. The agent picks it up, routes through OpenRouter, returns the response.

Why ARM (Graviton)

ARM instances (t4g family) are ~20% cheaper than equivalent x86 (t3) and run Go binaries natively. Memory footprint of a Go agent is ~30 MB; 2 GB is plenty.

Why free OpenRouter tier

DeepSeek R1, Qwen 3, Gemma 3, Llama 3.1 are all available free via OpenRouter. They handle 80% of agent tasks well. openrouter-free-infer routes between them with fallback.

For sensitive tasks where you want premium quality, swap in a paid model. Cost goes up; everything else is the same.

What this is good for

Personal AI assistant accessible from anywhere
Background research agent that polls and reports
Side projects where you want an always-on LLM without a GPU bill

What it isn't

A scalable production service (one instance, no auto-scale)
Suitable for latency-critical workloads (free OpenRouter models can be slow)
A replacement for hosted offerings if you need 99.99% uptime

I covered the broader skills layer in My AI Agent Skills Stack.

FAQ

Q: Can I run multiple agents on one instance? A: Yes — they share the LLM router and memory layer.

Q: How do I add new capabilities? A: Drop a new skill into the agent. The skills layer is open-source and pluggable.

Q: Does it work without OpenRouter? A: Yes — swap the router for any OpenAI-compatible endpoint. Costs change accordingly.

Written by Shihab Shahriar Antor. See my projects or hire me at Shahriar Labs.

Deploy Always-On AI Agents on AWS for ~$17/mo