/ SEARCH AS CODE · SHAPED BY USE

Your queries /
your interface.

Search as code, shaped by how you actually search. Agents write code over your data, not tool calls — but where others re-derive that code every run, here each accepted trajectory crystallises into a replay-gated, typed call in your tenant's lib/. The interface isn't designed up front; it emerges through usage, per tenant.

View README.md

$ datafetch mount finqa-2024 --intent "range chemicals revenue 14-16"

◆ worktree q42f8a·parent finqa-2024@v3·mounted

/mnt/finqa-2024/

├ AGENTS.mdcore

├ df.d.tscore

├ db/immutable

│ ├ filings.ts

│ └ chunks.ts

├ lib/tenant

│ ├ rangeTableMetric.tsinherited

│ └ skills/range.mdinherited

├ scripts/answer.tsthis intent

├ tmp/runs/001/

└ result/answer.mdsealed

/01 INIT Provider seeds a dataset env. Init agent stamps the base VFS template.
/02 MOUNT /mnt/<dataset> appears as a typed VFS, scoped per tenant + intent.
/03 WRITE Edit scripts/answer.ts in visible TypeScript over df.db.*.
/04 COMMIT Seal one df.answer({…}) with evidence and lineage.
/05 OPTIMISE df.lib.* grows per tenant; provider reshapes serving from real intents.

/ 02 — THE LOOP

How it works.

An interface that emerges through agentic search. Five stages, one tenanted environment per dataset.

/ 03 — EVAL · SKILLCRAFT 126

What it gets us.

Scored by the benchmark's own evaluator on 126 long-horizon agentic-search tasks. n=126 · 21 families × 6 tiers · iter3-full · 20260512.

/ skillcraft 126 · official grader iter3-full · 20260512

Quality of a vanilla agent. At 1 / 172 the cost. With crashes gone.

94.4%

pass rate

119 / 126 tasks vs 60–70% cache-as-skill
3,027 tok

per task

Sonnet 4.6 · prompt-cached 172× under vanilla (520k)
0.8%

runtime errors

every class → 0 vs 24–30% cache-as-skill

Hard tier (n=21): +7.9 pp over the vanilla ceiling.

View full eval results Milestone 1 report

Your queries /your interface.

How it works.

What it gets us.

Your queries /
your interface.