ARCHITECTURE SHIPPED

Vault as Agent Infrastructure

agent-infrastructure sqlite obsidian linear

Five structural tests. Four knowledge systems. Three passes, two losses to Linear, and one of the two closers is already on disk.

What is this?

A five-test scoreboard that scores knowledge systems on the axes that matter when an agent has to act on the vault, not just read it: Persistent State, Defined Verbs, Ownership, Permissions, and a Queryable Audit History. Notion, Obsidian (default), Linear, and Sean’s vault each get a row. Three passes, two failures, and the two failures are the architectural argument for what ships next. Every cell is backed by live telemetry from vault/.vault-index.db: 632 typed edges across ~110 nodes, six SQL-enforced relations, 15,582 indexed chunks, regenerable with scripts/generate_schema.py.

Why this approach?

Most “PM tools” discourse is about how nice the editing surface feels. That’s a content question. The agent question is structural: does the system have typed edges the agent can query without scanning, defined verbs it can call without a free-form LLM bridge, an audit history a reviewer can replay? The scorecard turns that structural question into a single visible result, and scoring against Nate Jones’s published tests means citing an external standard instead of grading my own homework. Comparison is the discipline: equal presentation across all four systems is the honesty signal. The reader reaches the conclusion from the symbols, not from a highlighted “Sean’s-vault” row.

What would break?

Three named failure modes: (1) the synthetic public vault fixture drifts from the production schema as concept_edges evolves, mitigated by the scripts/generate_schema.py step that re-emits the ER diagram and stats from the live SQLite file (the edge count already moved 478 → 632 in nine days, which is why the numbers are regenerated, not hand-typed); (2) the Notion/Obsidian/Linear rows go stale as those products ship features, mitigated by a quarterly rescore; (3) recruiters expecting a “Notion-killer” pitch bounce because the scoreboard is honest about the axes where Linear wins; that’s the intended cost. The page filters for the audience that values calibration over salesmanship.

What did I learn?

Equality of presentation is the credibility signal. The temptation was to highlight Sean’s-vault row, wash the passing cells in success green, or stack the columns in my order. Resisting that produced a scoreboard a Linear PM could read without rolling their eyes. The two honest-note callouts (Ownership and Permissions, both Linear-better) are the load-bearing part of earning that read. And the most useful loss to name turned out to be the one whose fix already exists: the closer for the permissions gap, the Judge Layer, was already built and tested, which reframed that “failure” from a someday into a rollout.

erDiagram
    Concept ||--o{ ConceptEdge : "from_slug"
    Concept ||--o{ ConceptEdge : "to_slug"
    ConceptEdge {
        int id PK
        text from_slug FK
        text to_slug FK
        text relation "enum: supports | contradicts | evolved_into | supersedes | depends_on | related_to"
        real confidence "0..1, NULL when classifier-undecided"
        text valid_until "ISO 8601, NULL when current"
        text classifier_version
        text source_synth_run
        text created_at "ISO 8601"
    }
    Concept {
        text slug PK
        text title
        text body
    }
concept_edges schema: six relation types enforced by a SQL CHECK constraint, scored against 632 production edges (2026-05-29).

Scoreboard

Comparison of systems across 5 tests.
  STATEVERBSOWNPERMAUDIT
Notion
Obsidian (default)
Linear
Sean's vault

exceeds · passes · partial · fails

─ METHODS ─

Tools, agents, and models used on this project
TASK AGENT / TOOL MODEL / COST
persistent state SQLite + git + JSONL local / $0
typed edges Code Brain (concept_edges) Sonnet 4.6 (HybridRouter)
er diagram gen scripts/generate_schema.py stdlib / $0
fixture author Sean + LLM seed ~$0.10

— EXPLANATION —

What is this?

A five-test scoreboard that scores knowledge systems on the axes that matter when an agent has to act on the vault, not just read it: Persistent State, Defined Verbs, Ownership, Permissions, and a Queryable Audit History. Notion, Obsidian (default), Linear, and Sean’s vault each get a row. Three passes, two failures, and the two failures are the architectural argument for what ships next. Every cell is backed by live telemetry from vault/.vault-index.db: 632 typed edges across ~110 nodes, six SQL-enforced relations, 15,582 indexed chunks, regenerable with scripts/generate_schema.py.

Why this approach?

Most “PM tools” discourse is about how nice the editing surface feels. That’s a content question. The agent question is structural: does the system have typed edges the agent can query without scanning, defined verbs it can call without a free-form LLM bridge, an audit history a reviewer can replay? The scorecard turns that structural question into a single visible result, and scoring against Nate Jones’s published tests means citing an external standard instead of grading my own homework. Comparison is the discipline: equal presentation across all four systems is the honesty signal. The reader reaches the conclusion from the symbols, not from a highlighted “Sean’s-vault” row.

What would break?

Three named failure modes: (1) the synthetic public vault fixture drifts from the production schema as concept_edges evolves, mitigated by the scripts/generate_schema.py step that re-emits the ER diagram and stats from the live SQLite file (the edge count already moved 478 → 632 in nine days, which is why the numbers are regenerated, not hand-typed); (2) the Notion/Obsidian/Linear rows go stale as those products ship features, mitigated by a quarterly rescore; (3) recruiters expecting a “Notion-killer” pitch bounce because the scoreboard is honest about the axes where Linear wins; that’s the intended cost. The page filters for the audience that values calibration over salesmanship.

What did I learn?

Equality of presentation is the credibility signal. The temptation was to highlight Sean’s-vault row, wash the passing cells in success green, or stack the columns in my order. Resisting that produced a scoreboard a Linear PM could read without rolling their eyes. The two honest-note callouts (Ownership and Permissions, both Linear-better) are the load-bearing part of earning that read. And the most useful loss to name turned out to be the one whose fix already exists: the closer for the permissions gap, the Judge Layer, was already built and tested, which reframed that “failure” from a someday into a rollout.