BOSTON, JUNE 23, 2026 — anima, active
Anima
A 2D animation pipeline where AI agents do the volume work.
Project questions and answers
What is this?
Anima is a 10-phase pipeline for making 2D animated stories, run by a human and a small fleet of named agents: Maya plans the work, Cy builds the character bible, Em critiques every frame, Mo writes the public walkthrough. The human owns timing and taste and makes the call to ship; the agents own the parts that are cheap and repeatable at volume. What keeps it from being a click-to-generate toy is the order of operations: I block the motion in plain shapes before a single frame renders, so the timing is mine, not the model's. The Pencil Test short is the first piece built this way, Act 1 shipped and Act 2 in flight. The short is the proof; Anima is the system.
Why this approach?
The real decision was never which model to use, because the model layer is replaceable. It was which working method: spend years hand-animating it solo, let AI generate the whole thing fast and get mixed results, or direct a fleet and keep authorship. I picked the third: a studio-like division of labor without the studio, and built the architecture so the human role isn't tied to any one model.
What would break?
Three failure modes the architecture guards against by design, not three bugs I'm hoping to dodge. One: if the orchestrator and the critic share a model family, they miss the same problems together, so I pair a Sonnet orchestrator with a Gemini vision critic at the busiest checkpoint, cross-family on purpose. Two: any single phase can make its own output "better" while drifting off the approved brief, so Phase 0 freezes an immutable acceptance_criteria.json that every later critic must cite by ID before it can block. Three: a cheap critic can quietly turn into a rubber stamp, so Em earns trust the hard way, eval-gated against a 50-case hand-ratified corpus where she caught every planted defect at 0.97 precision.
What did I learn?
Validators can't recover taste that wasn't there at generation time, which is why the human authors the timing first. I learned the rest the hard way: my first version of that critic eval looked great until I found 19 of its 23 fixtures were near-duplicate images, so the score was measuring nothing. I threw the corpus out, rebuilt it by hand, and re-baselined. A fleet is only as honest as the eval underneath it.
─ METHODS ─
| TASK | AGENT / TOOL | MODEL / COST |
|---|---|---|
| keyframe stills | Gemini Nano Banana 2 | ~$0.04/frame |
| motion interpolation | Seedance 2.0 | ~$0.40/clip (Fast tier) |
| orchestration | Code Brain | Claude Sonnet 4.6 (HybridRouter) |
| vision critic (T2) | Gemini 3.1 Pro via Anti-Gravity CLI | $0 incremental (subscription) |
| multi-CLI critic (T3) | Codex CLI + Anti-Gravity CLI in parallel | $0 incremental (subscriptions absorb) |
| planner | Opus 4.7 (Maya persona) + Sonnet 4.6 adversarial | per-token billing |