AWS
AI Agent
Terraform
How-To

Deploy Always-On AI Agents on AWS for ~$17/mo

Shihab Shahriar Antor
8 min read

TL;DR

Run a private always-on AI agent on AWS for about $17/month using free OpenRouter models and Terraform. Here is the exact recipe and cost breakdown.

You can run a private always-on AI agent on AWS for about $17 per month using free OpenRouter models and Terraform. This is the same recipe behind hermes-agent-aws — accessible via Telegram or SSH, always reachable, no GPU bill. Here is the exact build.

What you get

  • An AWS-hosted agent reachable from Telegram and SSH
  • LLM inference via free OpenRouter models (DeepSeek R1, Qwen 3, Gemma 3, Llama 3.1)
  • Persistent memory via common-knowledge
  • Terraform-managed infrastructure — one terraform apply from zero to live

The cost breakdown

ItemCost / month
t4g.small (ARM, 2 vCPU, 2 GB RAM)~$11
30 GB EBS gp3~$2.50
Public IPv4 + minor egress~$3.50
LLM inference$0 (free OpenRouter tier)
Total~$17

The trick is the free OpenRouter tier via openrouter-free-infer. Cheap calls go to free models; premium calls would push the bill higher but are not required for most agent tasks.

The recipe

1. Provision

terraform apply

This provisions an ARM-based EC2 instance, a security group allowing Telegram webhook traffic and SSH, an EBS volume, and a static IP.

2. Install

The user-data script installs:

3. Configure

hermes init
hermes telegram --token=<your_bot_token>
hermes start

4. Test

Send a message to the bot from Telegram. The agent picks it up, routes through OpenRouter, returns the response.

Why ARM (Graviton)

ARM instances (t4g family) are ~20% cheaper than equivalent x86 (t3) and run Go binaries natively. Memory footprint of a Go agent is ~30 MB; 2 GB is plenty.

Why free OpenRouter tier

DeepSeek R1, Qwen 3, Gemma 3, Llama 3.1 are all available free via OpenRouter. They handle 80% of agent tasks well. openrouter-free-infer routes between them with fallback.

For sensitive tasks where you want premium quality, swap in a paid model. Cost goes up; everything else is the same.

What this is good for

  • Personal AI assistant accessible from anywhere
  • Background research agent that polls and reports
  • Side projects where you want an always-on LLM without a GPU bill

What it isn't

  • A scalable production service (one instance, no auto-scale)
  • Suitable for latency-critical workloads (free OpenRouter models can be slow)
  • A replacement for hosted offerings if you need 99.99% uptime

I covered the broader skills layer in My AI Agent Skills Stack.

FAQ

Q: Can I run multiple agents on one instance? A: Yes — they share the LLM router and memory layer.

Q: How do I add new capabilities? A: Drop a new skill into the agent. The skills layer is open-source and pluggable.

Q: Does it work without OpenRouter? A: Yes — swap the router for any OpenAI-compatible endpoint. Costs change accordingly.


Written by Shihab Shahriar Antor. See my projects or hire me at Shahriar Labs.

Written by

Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. Creator of LetX, QuantumSketch, and more.

Share this mission log