Vault as Agent Infrastructure: 5-Test Scorecard

Tools, agents, and models used on this project
TASK	AGENT / TOOL	MODEL / COST
telemetry extraction	scripts/generate_schema.py (stdlib sqlite3)	local / $0
typed-edge store	Code Brain (concept_edges)	Sonnet 4.6 (HybridRouter)
schema enforcement	SQLite CHECK constraints	local / $0
synthetic fixture	hand-authored + line reviewed	$0

What is this?

A five-test scoreboard for agent infrastructure. Nate Jones published five structural tests (persistent state, defined verbs, ownership, permissions, queryable audit history) and this scores four knowledge systems against them: Notion, default Obsidian, Linear, and my vault. The verdict is three passes and two honest losses, every cell backed by live telemetry from vault/.vault-index.db (632 typed edges, six SQL-enforced relations, 15,582 indexed chunks).

Why this approach?

Nate’s framing is the recruiter vocabulary for agent-infrastructure work right now, so scoring against it beats inventing my own rubric: it’s citing a standard instead of grading my own homework. It also makes the next build, vault-knowledge-mcp, self-justifying: the MCP ships as “the wrapper around the only public PM vault that passes the agent-infrastructure tests.” The two losses to Linear are deliberate. Linear’s RBAC and record-ownership model genuinely beat POSIX file permissions, and naming that is what calibrates the whole artifact’s credibility.

What would break?

The numbers are a snapshot, so they go stale as the synthesizer runs nightly: they already moved from 478 edges to 632 in nine days. That’s mitigated by making them regenerable on demand from the live schema rather than hand-maintained. The deeper risk is reading a scorecard I authored about my own system as objective truth; the Linear-wins-here callouts are the structural check against that.

What did I learn?

Most “agent infrastructure” claims fail the persistent-state test on day one: ask whether the structured state survives a crash with no recovery procedure, and most setups turn out to be tool integrations with session-scoped memory. The test discriminates. And the most useful losses to name are the ones whose fix already exists: the closer for the permissions gap (the Judge Layer, a control-plane interceptor with an append-only decision ledger) was already built, which turned that “failure” from a someday into a rollout.

Vault as Agent Infrastructure: 5-Test Scorecard

─ METHODS ─

─ EXPLANATION ─

What is this?

Why this approach?

What would break?

What did I learn?

─ WHAT THIS DOESN'T YET DO ─

─ METHODS ─

─ EXPLANATION ─

What is this?

Why this approach?

What would break?

What did I learn?

─ WHAT THIS DOESN'T YET DO ─

─ RELATED ─