Building Context-Heavy: Knowledge Graph API
TL;DR
Context-Heavy is a multi-tenant knowledge-graph API in Go (pgvector + recursive CTEs) that gives AI agents persistent context. Here is the architecture.
Context-Heavy is a multi-tenant REST API I built in Go that stores, traverses and queries context for AI agents as a knowledge graph. Backed by PostgreSQL + pgvector for semantic search, Redis for caching, and recursive CTEs for graph traversal. This post is the architecture in detail.
Why a knowledge graph (not just a vector DB)
Vector DBs solve similarity. They don't solve structure. "Find me docs related to OAuth" is similarity. "Find me people connected to Alice through 2 hops of authoring history" is structure.
Most AI agent memory products are vector-only. That works until your agent needs to reason about relationships — at which point a graph is the right abstraction.
I covered the design pattern in Advanced RAG: Semantic Caching + Knowledge Graphs.
The stack
| Layer | Tech | Why |
|---|---|---|
| API | Go + Gin | Lean, concurrent, low memory |
| Auth | JWT + Google OAuth | Multi-tenant from day one |
| Storage | PostgreSQL 16 + pgvector | One DB, semantic + graph |
| Cache | Redis | Hot-path embeddings + query cache |
| Infra | AWS ECS Fargate + Terraform | Reproducible deploys |
One Postgres, both vector and graph
pgvector gives you cosine similarity in SQL. Graph traversal is a recursive CTE in the same SQL dialect:
WITH RECURSIVE graph AS (
SELECT id, edges FROM nodes WHERE id = $1
UNION ALL
SELECT n.id, n.edges
FROM nodes n
JOIN graph g ON n.id = ANY(g.edges)
)
SELECT * FROM graph LIMIT 100;
That gives us "all nodes reachable from $1" in one query. Combine with a vector similarity filter and you get hybrid retrieval — "find related concepts, restricted to a tenant's graph."
Multi-tenant from day one
Every row has a tenant_id column. Every query is filtered by it. The auth layer rejects mismatched JWT/tenant pairs at the gateway. This is boring and crucial — multi-tenant added later means rewriting half the API.
Performance
The interesting query is "vector + 2-hop traversal." Typical latency:
- p50: 35 ms
- p95: 110 ms
- p99: 240 ms
The slow path is recursive CTE depth; bounding depth at 3 hops keeps p99 sane.
What's next
A streaming response API for agents that want partial results, and a hosted offering for teams that don't want to run the API themselves.
FAQ
Q: Why not Neo4j? A: Operational complexity. Postgres + pgvector covers 95% of use cases and only one database to operate.
Q: How does this differ from a vector DB like Pinecone? A: Vector DBs answer similarity questions. Context-Heavy answers similarity plus structural questions ("connected to X via Y").
Q: Is it open source? A: The API is currently private; reach out if you want early access.
Built by Shihab Shahriar Antor. Related: Building common-knowledge, My AI Agent Skills Stack. Hire me.
Written by
Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. Creator of LetX, QuantumSketch, and more.