Intent Engineering MCP

An MCP server that checks an agent's spec before it runs.

A pencil-test illustration on cream paper: a child sits cross-legged in a glowing camping tent at night, drawing in a spiral sketchbook. A small red pixel-art creature hovers off the page as its sketched pixel form takes shape on the paper; a lantern and a pine forest sit beyond the tent flaps.

Project questions and answers

What is this?

Intent Engineering MCP is a tool for Claude Desktop and Cursor that checks the instructions you hand an AI agent before it runs. The premise: a lot of agent failures aren't reasoning failures, they're intent failures, a spec so vague the agent confidently builds the wrong thing, with no clear stop rule or success test. Under the hood it's an MCP server with three tools, audit, scaffold, and triage, that score a spec against a 9-section template. It productizes the intent-spec checklist I'd been running by hand, published on npm and the official MCP registry under a domain-verified namespace. A quick sanity check: point its audit at its own spec and it scores 23 out of 25.

Why this approach?

The real question was why build a tool at all instead of a doc. A checklist you read once gets forgotten by the time you actually write the spec; a tool that runs inside the agent's own client enforces the discipline at the moment of use, not the moment of good intentions. That's why I shipped it as an MCP server, not a CLI or a web app you'd have to remember to open. The one credential call I'd defend in a review: I took the harder domain-verified registry path over the one-command GitHub-handle form, because the registry is about the only provenance signal MCP has, and a tool you hand a prompt to has to prove who built it.

What would break?

The honest answer starts with a hole it actually had: the audit tool would read any file path you handed it, so pointing it at /etc/passwd was a real file-disclosure bug. I caught it and routed every disk read through one guard, with an allowlist of extensions, symlink resolution, a root confinement check, and a size cap. The more telling decision was what I chose not to add. OAuth and sandboxing are the reflexive next step, but this version is stdio-only, with no network transport and no path that executes user code, so I treated the file read as the real risk and fixed that first instead of bolting on controls the threat model didn't call for. The forward risk is the protocol moving under me, the registry schema has already rev'd once, so the server pins a published version and ships an install health check.

What did I learn?

The part I'd reuse first is a rule I set on day one: any scope change had to be written and signed off in the changelog before a line of code got touched. That single constraint shipped it 13 days ahead of the May 25 target, by turning every "just one more feature" into a decision on the record. A hosted transport and anything that widened the surface got pushed to v1, on purpose, on the calendar. Locking scope in writing is the kind of process I'd install on a team.

A hand-drawn pencil-test before-and-after diagram on cream paper. On the left, a vague spec drawn as a note card covered in scribbles and question marks, labeled VAGUE SPEC, NO STOP RULE. An arrow leads to a teal rubber-stamp box labeled INTENT ENGINEERING MCP, with three tool chips reading AUDIT, SCAFFOLD, TRIAGE and a note reading SCORED VS 9-SECTION TEMPLATE. A second arrow leads to a clean structured spec drawn as a checklist with check marks and an amber score reading 23/25, then a small terminal-window agent glyph labeled AGENT RUNS THE RIGHT THING.
A hand-drawn pencil-test before-and-after diagram on cream paper. On the left, a vague spec drawn as a note card covered in scribbles and question marks, labeled VAGUE SPEC, NO STOP RULE. An arrow leads to a teal rubber-stamp box labeled INTENT ENGINEERING MCP, with three tool chips reading AUDIT, SCAFFOLD, TRIAGE and a note reading SCORED VS 9-SECTION TEMPLATE. A second arrow leads to a clean structured spec drawn as a checklist with check marks and an amber score reading 23/25, then a small terminal-window agent glyph labeled AGENT RUNS THE RIGHT THING.
Spec quality score: 7 of 25.
Spec quality
The spec gets audited and scored before the agent runs, so the agent builds the right thing.

─ METHODS ─

Tools, agents, and models used on this project
TASK AGENT / TOOL MODEL / COST
protocol implementation TypeScript MCP SDK open-source / $0
transport stdio local / $0
registry verification DNS-verified MCP registry free
package publish npm free tier
development environment Claude Code per-token billing
Pencil-test sketch of Sean walking off the page, looking back with a pencil raised and storyboard sheets under his arm