Building Context-Heavy: Knowledge Graph API

Context-Heavy is a multi-tenant REST API I built in Go that stores, traverses and queries context for AI agents as a knowledge graph. Backed by PostgreSQL + pgvector for semantic search, Redis for caching, and recursive CTEs for graph traversal. This post is the architecture in detail.

Why a knowledge graph (not just a vector DB)

Vector DBs solve similarity. They don't solve structure. "Find me docs related to OAuth" is similarity. "Find me people connected to Alice through 2 hops of authoring history" is structure.

Most AI agent memory products are vector-only. That works until your agent needs to reason about relationships — at which point a graph is the right abstraction.

I covered the design pattern in Advanced RAG: Semantic Caching + Knowledge Graphs.

The stack

Layer	Tech	Why
API	Go + Gin	Lean, concurrent, low memory
Auth	JWT + Google OAuth	Multi-tenant from day one
Storage	PostgreSQL 16 + pgvector	One DB, semantic + graph
Cache	Redis	Hot-path embeddings + query cache
Infra	AWS ECS Fargate + Terraform	Reproducible deploys

One Postgres, both vector and graph

pgvector gives you cosine similarity in SQL. Graph traversal is a recursive CTE in the same SQL dialect:

WITH RECURSIVE graph AS (
    SELECT id, edges FROM nodes WHERE id = $1
    UNION ALL
    SELECT n.id, n.edges
    FROM nodes n
    JOIN graph g ON n.id = ANY(g.edges)
)
SELECT * FROM graph LIMIT 100;

That gives us "all nodes reachable from $1" in one query. Combine with a vector similarity filter and you get hybrid retrieval — "find related concepts, restricted to a tenant's graph."

Multi-tenant from day one

Every row has a tenant_id column. Every query is filtered by it. The auth layer rejects mismatched JWT/tenant pairs at the gateway. This is boring and crucial — multi-tenant added later means rewriting half the API.

Performance

The interesting query is "vector + 2-hop traversal." Typical latency:

p50: 35 ms
p95: 110 ms
p99: 240 ms

The slow path is recursive CTE depth; bounding depth at 3 hops keeps p99 sane.

What's next

A streaming response API for agents that want partial results, and a hosted offering for teams that don't want to run the API themselves.

FAQ

Q: Why not Neo4j? A: Operational complexity. Postgres + pgvector covers 95% of use cases and only one database to operate.

Q: How does this differ from a vector DB like Pinecone? A: Vector DBs answer similarity questions. Context-Heavy answers similarity plus structural questions ("connected to X via Y").

Q: Is it open source? A: The API is currently private; reach out if you want early access.

Built by Shihab Shahriar Antor. Related: Building common-knowledge, My AI Agent Skills Stack. Hire me.