QuantumSketch
Manim
AI Video
Architecture

Building QuantumSketch: AI + Manim for STEM Video

Shihab Shahriar Antor
8 min read

TL;DR

QuantumSketch turns text prompts into narrated STEM animations using LLMs and ManimGL. Here is the pipeline that takes an idea to a finished video.

QuantumSketch is an AI educational video engine I built at Shahriar Labs that turns a text prompt into a narrated STEM animation rendered with ManimGL. Type "explain the Fourier transform"; get back a real animated explainer with voice-over. This post walks through the pipeline.

Why Manim, not Stable-Diffusion-on-frames

Most AI video tools generate per-frame imagery. That looks fluid but breaks for technical content — formulas drift, axes wobble, vectors mis-align. For STEM, you need symbolic correctness more than photorealism.

ManimGL renders math the way 3Blue1Brown's videos do — declarative scene code, exact geometry, smooth animations. QuantumSketch trades novelty for trust.

The pipeline

prompt -> outline -> storyboard -> scene code -> narration -> render -> stitch

Each step is a long-running task. A 90-second video can take 10+ minutes end-to-end. That's why the orchestration layer matters.

1. Outline (LLM)

The first LLM call turns the prompt into an outline:

prompt: "explain the Fourier transform"
outline:
  - intro: what problem does it solve
  - signal in time domain
  - decomposition into sinusoids
  - frequency domain visualization
  - applications (audio, MRI)

2. Storyboard

Each section becomes a Manim scene description: camera moves, what enters/exits, key formulas. This step is where most subtle bugs live; we constrain it with a structured Pydantic schema.

3. Scene code

Another LLM call writes actual Manim Python. The code is validated by running ManimGL in a sandbox with a short timeout — broken scenes loop back for repair.

4. Narration

The script flows into a TTS model. We align narration to scene durations with a re-time pass so the voice never gets ahead of the visuals.

5. Render

ManimGL renders each scene to MP4. The render farm is just a queue feeding worker processes.

6. Stitch

ffmpeg concatenates the rendered scenes and mixes the narration track.

Why Temporal.io runs the show

The pipeline is exactly the kind of workflow Temporal was built for: long-running, multi-step, with retries and human checkpoints. A render can fail at step 5 of 6 because Manim hit a memory limit — Temporal lets us resume from that step instead of restarting.

I wrote a deeper post on this in Temporal.io for Long-Running GenAI Workflows.

What QuantumSketch is good at — and what it isn't

Good at: math, physics, computer science concepts that have clean geometric representations. Fourier, gradient descent, graph algorithms, lattices.

Not good at: organic chemistry, biology illustrations, anything that needs realistic imagery. We don't try to fake it.

Try it

QuantumSketch is in beta. If you teach STEM or build educational content, ping me — I am especially looking for feedback from teachers in low-bandwidth regions, since the rendered MP4s ship at sane sizes (under 8 MB for 90 seconds).

FAQ

Q: What is QuantumSketch used for? A: Turning text prompts into educational STEM animations. Teachers, tutors and content creators use it to visualize math, physics and computer science.

Q: Why ManimGL instead of generative video? A: ManimGL renders math correctly — formulas stay sharp, axes stay aligned. Generative video drifts on symbolic content.

Q: How long does a video take to render? A: A 90-second video takes 5-15 minutes end-to-end, dominated by Manim render time.

Q: Can I tweak the generated Manim code? A: Yes — we expose the generated scene code so you can edit and re-render.


Built by Shihab Shahriar Antor. More case studies: How I Built LetX, Building ChessGoddess. Hire me.

Written by

Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. Creator of LetX, QuantumSketch, and more.

Share this mission log