Anima

A 2D animation pipeline where AI agents do the volume work.

A pencil-test character-design sketch on cream sketchbook paper: a young warrior in a hooded, tattered cloak grips a tall spear with an amber ribbon tied near its tip, braced in a determined stance atop a rocky outcrop. Three smaller pose studies of the same figure run down the left margin.

Project questions and answers

What is this?

Anima is a 10-phase pipeline for making 2D animated stories, run by a human and a small fleet of named agents: Maya plans the work, Cy builds the character bible, Em critiques every frame, Mo writes the public walkthrough. The human owns timing and taste and makes the call to ship; the agents own the parts that are cheap and repeatable at volume. What keeps it from being a click-to-generate toy is the order of operations: I block the motion in plain shapes before a single frame renders, so the timing is mine, not the model's. The Pencil Test short is the first piece built this way, Act 1 shipped and Act 2 in flight. The short is the proof; Anima is the system.

Why this approach?

The real decision was never which model to use, because the model layer is replaceable. It was which working method: spend years hand-animating it solo, let AI generate the whole thing fast and get mixed results, or direct a fleet and keep authorship. I picked the third: a studio-like division of labor without the studio, and built the architecture so the human role isn't tied to any one model.

What would break?

Three failure modes the architecture guards against by design, not three bugs I'm hoping to dodge. One: if the orchestrator and the critic share a model family, they miss the same problems together, so I pair a Sonnet orchestrator with a Gemini vision critic at the busiest checkpoint, cross-family on purpose. Two: any single phase can make its own output "better" while drifting off the approved brief, so Phase 0 freezes an immutable acceptance_criteria.json that every later critic must cite by ID before it can block. Three: a cheap critic can quietly turn into a rubber stamp, so Em earns trust the hard way, eval-gated against a 50-case hand-ratified corpus where she caught every planted defect at 0.97 precision.

What did I learn?

Validators can't recover taste that wasn't there at generation time, which is why the human authors the timing first. I learned the rest the hard way: my first version of that critic eval looked great until I found 19 of its 23 fixtures were near-duplicate images, so the score was measuring nothing. I threw the corpus out, rebuilt it by hand, and re-baselined. A fleet is only as honest as the eval underneath it.

A hand-drawn pencil-test diagram on cream paper. At left, a dark-inked running figure labeled HUMAN, TIMING + TASTE. In the center, five lighter teal sketches of the same figure in progressive running positions, each nudged along by a small pencil-holding hand, labeled FLEET, VOLUME. At right, a finished inked running figure with an amber check mark, labeled SHIP. A numbered timeline from 1 to 10 runs along the bottom, labeled 10-PHASE PIPELINE.
10-phase pipeline
Human owns the keyframes; the fleet draws every in-between.

─ METHODS ─

Tools, agents, and models used on this project
TASK AGENT / TOOL MODEL / COST
keyframe stills Gemini Nano Banana 2 ~$0.04/frame
motion interpolation Seedance 2.0 ~$0.40/clip (Fast tier)
orchestration Code Brain Claude Sonnet 4.6 (HybridRouter)
vision critic (T2) Gemini 3.1 Pro via Anti-Gravity CLI $0 incremental (subscription)
multi-CLI critic (T3) Codex CLI + Anti-Gravity CLI in parallel $0 incremental (subscriptions absorb)
planner Opus 4.7 (Maya persona) + Sonnet 4.6 adversarial per-token billing
Pencil-test sketch of Sean walking off the page, looking back with a pencil raised and storyboard sheets under his arm