DATAFETCH/AI OWN YOUR INTERFACE GITHUB
/ SEARCH AS CODE · SHAPED BY USE

Your queries /
your interface.

Search as code, shaped by how you actually search. Agents write code over your data, not tool calls — but where others re-derive that code every run, here each accepted trajectory crystallises into a replay-gated, typed call in your tenant's lib/. The interface isn't designed up front; it emerges through usage, per tenant.

View README.md
$ datafetch mount finqa-2024 --intent "range chemicals revenue 14-16"
worktree q42f8a·parent finqa-2024@v3·mounted
/mnt/finqa-2024/
├ AGENTS.mdcore
├ df.d.tscore
├ db/immutable
│ ├ filings.ts
│ └ chunks.ts
├ lib/tenant
│ ├ rangeTableMetric.tsinherited
│ └ skills/range.mdinherited
├ scripts/answer.tsthis intent
├ tmp/runs/001/
└ result/answer.mdsealed
  1. /01 INIT Provider seeds a dataset env. Init agent stamps the base VFS template.
  2. /02 MOUNT /mnt/<dataset> appears as a typed VFS, scoped per tenant + intent.
  3. /03 WRITE Edit scripts/answer.ts in visible TypeScript over df.db.*.
  4. /04 COMMIT Seal one df.answer({…}) with evidence and lineage.
  5. /05 OPTIMISE df.lib.* grows per tenant; provider reshapes serving from real intents.
/ 02 — THE LOOP

How it works.

An interface that emerges through agentic search. Five stages, one tenanted environment per dataset.

/ 03 — EVAL · SKILLCRAFT 126

What it gets us.

Scored by the benchmark's own evaluator on 126 long-horizon agentic-search tasks. n=126 · 21 families × 6 tiers · iter3-full · 20260512.

/ skillcraft 126 · official grader iter3-full · 20260512

Quality of a vanilla agent. At 1 / 172 the cost. With crashes gone.

  1. 94.4%
    pass rate
    119 / 126 tasks vs 60–70% cache-as-skill
  2. 3,027 tok
    per task
    Sonnet 4.6 · prompt-cached 172× under vanilla (520k)
  3. 0.8%
    runtime errors
    every class → 0 vs 24–30% cache-as-skill

Hard tier (n=21): +7.9 pp over the vanilla ceiling.