BOSTON, MAY 31, 2026
INFRA SHIPPED
Vault as Agent Infrastructure: 5-Test Scorecard
─ METHODS ─
| TASK | AGENT / TOOL | MODEL / COST |
|---|---|---|
| telemetry extraction | scripts/generate_schema.py (stdlib sqlite3) | local / $0 |
| typed-edge store | Code Brain (concept_edges) | Sonnet 4.6 (HybridRouter) |
| schema enforcement | SQLite CHECK constraints | local / $0 |
| synthetic fixture | hand-authored + line reviewed | $0 |
─ EXPLANATION ─
What is this?
A five-test scoreboard for agent infrastructure. Nate Jones published five structural tests (persistent state, defined verbs, ownership, permissions, queryable audit history) and this scores four knowledge systems against them: Notion, default Obsidian, Linear, and my vault. The verdict is three passes and two honest losses, every cell backed by live telemetry from vault/.vault-index.db (632 typed edges, six SQL-enforced relations, 15,582 indexed chunks).
Why this approach?
Nate’s framing is the recruiter vocabulary for agent-infrastructure work right now, so scoring against it beats inventing my own rubric: it’s citing a standard instead of grading my own homework. It also makes the next build, vault-knowledge-mcp, self-justifying: the MCP ships as “the wrapper around the only public PM vault that passes the agent-infrastructure tests.” The two losses to Linear are deliberate. Linear’s RBAC and record-ownership model genuinely beat POSIX file permissions, and naming that is what calibrates the whole artifact’s credibility.
What would break?
The numbers are a snapshot, so they go stale as the synthesizer runs nightly: they already moved from 478 edges to 632 in nine days. That’s mitigated by making them regenerable on demand from the live schema rather than hand-maintained. The deeper risk is reading a scorecard I authored about my own system as objective truth; the Linear-wins-here callouts are the structural check against that.
What did I learn?
Most “agent infrastructure” claims fail the persistent-state test on day one: ask whether the structured state survives a crash with no recovery procedure, and most setups turn out to be tool integrations with session-scoped memory. The test discriminates. And the most useful losses to name are the ones whose fix already exists: the closer for the permissions gap (the Judge Layer, a control-plane interceptor with an append-only decision ledger) was already built, which turned that “failure” from a someday into a rollout.
─ WHAT THIS DOESN'T YET DO ─
- Ownership is filesystem + git + frontmatter only: no per-record assignee/watcher model. Linear wins this axis; vault-knowledge-mcp is the closer.
- Permissions are POSIX + git-remote only: no per-record RBAC. The Judge Layer (built) is the control-plane primitive underneath the fix; full RBAC is rollout work.
- Scorecard numbers are a dated snapshot; they drift as the nightly synthesizer runs. Regenerable via generate_schema.py rather than hand-maintained.