Hyphae — Verifiable provenance for grounded retrieval

01 — The problem & the honest core

Paraphrase breaks the binding to its source.

When a language model paraphrases a retrieved passage, the output can no longer be bound to its source byte-for-byte. The wording drifts; the citation becomes a gesture, not a guarantee. For grounded retrieval — where the whole value is that an answer is traceable to a real, unaltered source — that drift is the failure mode.

Hyphae's answer is deliberately narrow: emit the source verbatim, write it over a hash-chained journal, and let anyone audit the binding independently. The hash chain itself is classical — Haber–Stornetta (1991), Merkle (1988), Certificate Transparency, git. The contribution is the application to grounded retrieval, and the measurement.

The echo control · the paper's spine

A trivial echo baseline — a few lines that just print the retrieved fragment back — ties Hyphae on every correctness and grounding metric. So does echo + journal. That is the point, not a weakness: correctness and grounding are properties of verbatim quotation, not of any one system. Provenance is an addable layer any extractive retriever can adopt. Hyphae is a reference realization of it.

A paraphrase can't be bound to its source byte-for-byte. A verbatim quotation can — and a hash chain makes that binding independently auditable. Hash-Chained Verbatim Quotation · the contribution in one line

02 — The provenance layer

Five mechanisms, each independently falsifiable.

Verifiable provenance is built from five layered mechanisms. Each can be tested on its own, and each can fail without invalidating the others. Together they make every answer span auditable back to a named, unaltered source — and they hold against an attacker who knows the chain is there, rewrites it, rolls it back, withholds entries, or steals a retired key.

Mechanism 01Emission

Byte-identical quotation.

Every answer span is a verbatim copy of a stored fragment — no paraphrase, no rewording. The output is bindable to its source byte-for-byte, which is exactly what an LLM's paraphrastic generation destroys.

Falsifiable — any non-verbatim span breaks it

Mechanism 02Journal

SHA-256 hash-chained log.

Fragments and emissions are written to an append-only journal where each entry commits to the previous entry's hash. Store-only tampering across ten modes is detected and localized. The chain is classical; the application is the point.

Falsifiable — a missed store-only edit breaks it

Mechanism 03Anchor

External Ed25519 head anchor.

A bare chain falls to a chain-aware adversary who rewrites history and re-hashes. An Ed25519 signature over the chain head — key held outside the store process — closes that gap and is caught when the rewrite is attempted.

Falsifiable — an undetected chain rewrite breaks it

Mechanism 04Ledger + witness

Append-only, witnessed.

Heads are published to an append-only, hash-chained ledger — so a rollback replaying a stale anchor is rejected (freshness) and forked histories are caught (non-equivocation). An independent witness of the ledger tail catches a store that withholds entries.

Falsifiable — an accepted rollback or withheld entry breaks it

Mechanism 05Keyring

Signed key rotation.

A keyring rotates the anchor key — each successor authorized by its predecessor from a root trusted out-of-band. A ledger spanning rotations verifies under the per-epoch key, and a retired key can sign no new history, so key compromise is recoverable, not fatal.

Falsifiable — a forged successor or retired-key forgery breaks it

03 — Architecture (the substrate)

The retrieval & runtime substrate.

Hyphae runs on a coordinated set of subsystems modeled on mammalian brain regions, communicating through typed pathways. They handle retrieval, state and runtime coordination — fetching candidate fragments, tracking conversation, managing the journal.

They are not what makes a quoted answer correct — verbatim emission is. The echo control proves it: a few lines that print the retrieved fragment back score the same. Read this section as the substrate that retrieves and persists, not as the source of the answer's quality.

S01 · Thalamus

Input gating

First filter on incoming signal. Decides what reaches the substrate.

S02 · Hippocampus

Episodic memory

Stores fragments and runs pattern completion at initial retrieval.

S03 · Entorhinal Cortex

Binding & conductivity

Binds fragments and computes the conductivity weights on the retrieval network.

S04 · Amygdala

Valence

Assigns affective valence to fragments at write time and on recall.

S05 · Locus Coeruleus

Precision modulation

Modulates precision and gates consolidation alongside the BNST.

S06 · Frontal Cortex

Working set

Bounded working memory (7 fragments). Controlled inhibitory filtering of retrieval results.

S07 · ACC

Conflict & metacognition

Detects conflict, tracks threads, open questions and conversation chronology.

S08 · Cerebellum

Cascade activation

Predictive coding and spreading activation through the retrieval network.

S09 · Mammillary

Temporal indexing

Indexes fragments and threads on the temporal axis.

S10 · BNST

Consolidation gating

Decides which fragments graduate from working memory to long-term storage.

S11 · Olfactory Bulb

Salience gating

Salience model for input streams. Future foundation for multimodal extension.

S12 · Dopaminergic Midbrain

Curiosity firing

Decides when to fire a curiosity operation against external grounding.

S13 · Insula

Interoception

Tracks substrate-internal signals — load, latency, integrity, drift.

S14 · Striatum

Action selection

Selects among candidate emissions and tool invocations under reward.

S15 · Lateral PFC

Schema selection

Picks the emission schema (DialogueReply, GroundedAssertion, etc.) for the intent at hand.

S16 · Realizer

Verbatim emission

Byte-identical fragment quotation + minimal connective tissue. Slot binding with depth-two backtracking.

S17 · Journal

Audit chain

SHA-256 hash chain over every significant event, with external Ed25519 head anchor. Verified during Recovery state.

04 — What makes it verifiable

Properties, not optimizations.

The properties below are what make a Hyphae answer independently auditable. They are claims about verifiability, not about answer quality — the echo control already settled the quality question.

P01

Byte-level bindability.

Every answer span is a byte-identical copy of a stored fragment, so the output binds to its source exactly. This is the property paraphrastic LLM generation destroys — once wording drifts, no span can be matched to a source byte-for-byte.

P02

Independent auditability.

Each emitted span carries provenance back to a named, unaltered source fragment. A third party can verify the binding without trusting the system — the audit does not depend on Hyphae attesting to its own honesty.

P03

Tamper-evidence (store-only).

The SHA-256 hash chain over the journal detects and localizes store-only tampering across the benchmark's ten modes — edit, delete, insert, reorder, bit-flip, truncate, duplicate, timestamp-skew, rollback, batch. The chain is classical (Haber–Stornetta, Merkle, Certificate Transparency); the contribution is applying it to grounded retrieval and measuring it.

P04

Tamper-evidence (chain-aware).

A bare chain falls to an adversary who rewrites history and re-hashes the whole chain. An external Ed25519 signature over the head — key held outside the store process — closes that gap and catches the rewrite.

P05

Defense in depth, witnessed.

Above the bare chain: an append-only Ed25519 ledger of signed heads gives freshness (a rolled-back head with a stale anchor is rejected) and non-equivocation; an independent witness of the ledger tail catches a store that withholds entries; and a signed keyring rotates the anchor key so compromise is recoverable, not fatal. Each layer is measured.

P06

No LLM in the path · general hardware.

The cognition path is deterministic over structured inputs — no model inference between query and answer. Rust, CPU-only, single binary; fits on commodity hardware. LLMs appear only as external comparators in the evaluation, never in the emission path.

05 — Current state

Published, and measured end to end.

The preprint is public and citable (Zenodo DOI). The provenance stack is built and verified in the open: a hash-chained journal, an external Ed25519 head anchor, an append-only ledger (freshness + non-equivocation), an external witness (against entry withholding), and a signed keyring (key rotation). A community-scale provenance benchmark measures the whole stack, including a defense-escalation experiment where each attack is caught by exactly the next layer up. All in the public repository, CI-gated.

05 — What the evaluation found

The echo control, in the table.

The evaluation spans 255 queries, twelve metrics and 18 LLM-based configurations — six models × three retrieval modes — across two corpora, plus a dedicated tamper-detection benchmark. The headline finding is below, and it is deliberately not a win: on correctness and grounding, a trivial echo baseline matches Hyphae. The metric that actually separates systems is provenance.

Scope · queries255

Two corpora

255 queries evaluated across two distinct corpora, twelve metrics each, with explicit failure thresholds per metric.

Scope · LLM configs18

6 models × 3 modes

Eighteen LLM-based comparator configurations: six models, three retrieval modes each. LLMs appear only as comparators — never in Hyphae's emission path.

Scope · controlsecho

Echo & echo + journal

Two trivial baselines that print the retrieved fragment back. They tie Hyphae on correctness and grounding — that is the contribution, stated honestly.

Scope · tampering10 × 3

Ten modes × three adversaries

The provenance benchmark (provbench v2) crosses ten tampering modes with three adversary profiles, plus a defense-escalation experiment: each attack — in-place edit, chain-aware rewrite, rollback-with-stale-anchor, withholding — is caught by exactly the next layer up (chain → anchor → ledger → witness).

Metric family

Hyphae

Echo

Echo + journal

Correctness

tie

Grounding

tie

Byte-level bindability

yes

Tamper-evidence

full

none

store-only

Correctness and grounding are properties of verbatim quotation — every quoting system ties. Provenance (the chain + the external anchor) is what Hyphae adds on top.

06 — Open questions

What the preprint does not settle.

The honest position includes its own limitations. These are the genuine open questions carried in the paper's future-work section — stated plainly rather than buried.

OPEN · 01

Generalization beyond extractive tasks.

Verbatim quotation fits questions whose answer is a contiguous span in some source; it cannot synthesize across sources. We have begun mapping this boundary with a multi-hop harness: a single-span system silently fails on multi-hop by default, and graceful degradation (abstaining) is achievable but requires an explicit abstention signal the realizer must implement. The live multi-hop column against an LLM comparator is the next measurement.

OPEN · 02

Source-ingestion boundary.

From the journal onward the threat model is now closed: the chain catches store-only edits, the anchor the chain-aware rewrite, the ledger adds freshness and non-equivocation, a witness catches withholding, and a signed keyring makes key compromise recoverable — measured across ten modes and three adversaries. What remains genuinely open is the ingestion boundary: attesting that fragments entered the journal faithfully in the first place. Provenance from the journal forward is closed; provenance into it is not.

OPEN · 03

arXiv endorsement pending.

The work is public now on Zenodo with a citable DOI. An arXiv version is forthcoming, pending category endorsement — we will link it here when it lands. Until then, the Zenodo record and the public repository are the authoritative artifacts.

07 — Deployment

One binary, three runtimes.

Hyphae compiles to a single Rust binary. The deployment configuration is what differentiates a portable offline-only runtime from a learning runtime that absorbs new fragments through curiosity — and from an edge variant tuned for constrained hardware.

Configuration · 01

Portable binary.

Operates entirely on the fragment store loaded at startup. Curiosity disabled, all budgets capped to zero, no Vertex credentials, no network access.

Use casesoffline operation, sensitive contexts, compliance-restricted environments
Networkdisabled
Curiosityoff · learning off
RAM1–2 GB

Configuration · 02

Learning binary.

Same code, different runtime configuration. Curiosity continuously active, Vertex grounding online, learning loop refining weights from feedback, web agency enabled.

Use casesproject lead's personal cognitive infrastructure · VPS dogfooding · multi-user
GroundingVertex AI · gemini-3.5-flash + Playwright browser
Learningon · auditable weight updates
RAM4–16 GB

Configuration · 03

Edge / IoT variant.

Minimal substrate for embedded devices and companion modes. Reduced subsystems, pre-trained fragment store, read-only runtime — the smallest shape of Hyphae that still preserves the substrate's contract.

Use casesembedded devices · companion mode · constrained hardware
Moderead-only · pre-trained store
Curiosityoff · learning off
RAM256 MB – 1 GB

08 — Validation & failure criteria

Twelve metrics. Explicit thresholds for failure.

The evaluation specifies twelve typed metrics with explicit floors — the points at which a provenance claim is empirically falsified. The provenance benchmark (provbench v2) runs ten tampering modes against three adversaries, plus a defense-escalation experiment across the full stack — chain → anchor → ledger → witness.

Failure is information, not termination. If a threshold trips, it is reported honestly — the echo control is itself an example of a result that contradicted the original framing and was published rather than hidden.

verbatim fidelity

any non-byte-identical span → emission falsified.

If an emitted span is not a byte-identical copy of its source fragment, the core guarantee — bindability to source — is broken.

provenance coverage

below 1.0 → an unattributed span shipped.

Every emitted span must resolve to a named source fragment. A single span without provenance is a coverage failure.

store-only tamper detection

below 1.0 across 4 modes → chain insufficient.

Edit, delete, insert and reorder against a store-only adversary must each be detected and localized by the hash chain.

chain-aware tamper detection

a missed rewrite → external anchor insufficient.

A chain-aware adversary who re-hashes the whole journal must be caught by the external Ed25519 head anchor; a miss falsifies the anchor.

echo parity

a quality gap vs echo → over-claim risk.

If Hyphae claimed to beat echo on correctness or grounding, that would be the over-claim the paper retracted. Parity with echo is the expected, honest result.

determinism

non-reproducible emission → audit invalid.

The cognition path must be deterministic over its inputs; a non-reproducible emission undermines the auditability the whole layer depends on.

10 — Architecture flow

How a query becomes an audited answer.

The retrieval-and-emission path, end to end: input gating, fragment retrieval, working-set assembly, verbatim emission, and the journal write that makes the result auditable. No model inference sits between the query and the answer.

Runtime topology · provenance path

User multilingual query + feedback

Perceptual Input Layer

in/1

Text queries

canonical input

in/2

Web navigation

Playwright agent

in/3

Vision · audio

future multimodal

Thalamus

input gating · normalize · route · state-check

Initial activation · 3-way fan-out

S8b

ACC

thread tracking · topic switch · conversation state · open questions

Hippocampus

saliency-weighted retrieval · pattern completion

Amygdala

valence assignment · emotional context

direct_fragments[]

Cascade activation · spreading through the causal graph

S3b

Entorhinal cortex

conductivity graph · boundary metadata · cross-fragment binding

adjacency + weights

Cerebellum

cascade propagation (Collins-Loftus + spreadr) · predictive coding · multi-hop with decay

activated_set[]

S8b + S4

ACC + Amygdala — coherence evaluation

conflict detection (lexical + NLI) · valence-modulated filtering

working_set[] + conflicts[]

Frontal Cortex — Executive Composition Orchestrator

working memory 7±2 cap · intent classification · schema selection · composite scoring (activation + relevance + saliency + recency) · conversation-state-aware composition

Hyphae-Surface · realization, no LLM in path

reg

Schema registry

DialogueReply · GroundedAssertion · Introspective · NarrativeArc · Comparative · SyntheticSummary

trig

Honest-limitation triggers

EmptyWorkingSet · HighConfabRisk · ShallowCascade · Contradictions · SourceLanguageMismatch

bind

Slot binding · fragment quotation · connective tissue

greedy + backtrack · verbatim from source · cross-lingual framing (lexicon · 9+ rules)

out

ComposedResponse

body · prelude · quoted_fragments[] · language_framings[] · realization_trace[] · provenance_chain[]

User response + feedback signal

feedback → learning loop · see background

Synchronous path · single request · no LLM in cognition

Background · 01

Learning loop (RFC v0.4)

Feedback signal — explicit and implicit — refines the substrate's parameters, not its structure. Every weight update is journalled with rollback capability.

Conductivity weights — which fragments connect more strongly
Saliency scores — which fragments rise / fall by use
Schema selection thresholds
Cascade decay & threshold per context
Honest-limitation triggers per domain

Background · 02

Curiosity (depth-aware)

ACC detects a causal gap → the Dopaminergic Midbrain evaluates depth + relevance + recency → curiosity fires through one of three channels.

Web grounding (Vertex AI) — snippets
Active browsing (Playwright) — full pages
API integrations — structured data
Olfactory Bulb gates absorption (quality + dup filter)
New fragments encoded via Hippocampus

Background · 03

Consolidation (SHY + LC/BNST)

Triggered when LC arousal and BNST valence are both low — Hippocampus replays, episodic → semantic abstraction, SHY proportional decay, conductivity graph compaction.

Sharp-wave-ripple replay
Pattern completion / separation
Old-trace interleaved (anti-catastrophic-forgetting)
Tononi SHY depth-weighted pruning
Mammillary temporal index refresh

~/.hyphae/
├── journal/     fjall LSM-tree + SHA-256 hash chain (immutable history)
├── state/       redb (state machine + counters)
├── fragments/   postcard binary (sharded, conductivity-indexed)
├── lexicon/     postcard binary (multilingual, in-memory cached)
├── learning/    postcard binary (weight updates with rollback chain)
├── decisions/   ADRs in markdown (architectural decision records)
└── exports/     JSON on-demand (debugging, migration, audit)

Prop 01

No context window

Persistent memory accessed by retrieval, not by inclusion in a prompt.

Prop 02

No LLM in the cognition path

Composition is deterministic over structured inputs; LLMs only consult external grounding.

Prop 03

Persistent conversational metacognition

Threads, open questions and pending follow-ups tracked as first-class system state.

Prop 04

Explicit causal grounding

Every fragment carries provenance with an explicit confabulation_risk.

Prop 05

Cryptographic audit

Journal hash chain — tampering is detectable, recovery state verifies integrity.

Prop 06

General-purpose hardware

CPU + RAM. No GPU dependence at any tier — including edge / IoT.

Prop 07

Verbatim source quotation

Fragments preserve original content. The realizer only generates connective tissue.

Prop 08

Auditable learning

Every weight update is journalled with a rollback chain. Structure is never mutated.

Prop 09

Honest limitation

Explicit acknowledgment when working material is insufficient — typed triggers, not a generic apology.

11 — Read it. Run it. Cite it.

Read it. Run it. Cite it.

Rust substrate, the LLM+RAG comparator, every result envelope, the tamper-detection experiment, and the full preprint — all public, dual-licensed Apache-2.0 / CC-BY-4.0.

Cite the preprint DOI 10.5281/zenodo.20436643 →

Gutiérrez, M. (2026). Hash-Chained Verbatim Quotation: A Verifiable Provenance Layer for Grounded Retrieval. Zenodo. https://doi.org/10.5281/zenodo.20436643

@misc{gutierrez2026hyphae,
  author       = {Guti{\'e}rrez, Mario},
  title        = {{Hash-Chained Verbatim Quotation: A Verifiable
                  Provenance Layer for Grounded Retrieval}},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.20436643},
  url          = {https://doi.org/10.5281/zenodo.20436643},
  note         = {Hyphae v2. Code, corpora, result envelopes, and
                  the preprint. \url{https://github.com/terrizoaguimor/hyphae-v2}}
}

Read the code Cite the preprint →

Apache-2.0 code CC-BY-4.0 docs arXiv version forthcoming

Verifiable provenance
for grounded retrieval.

Paraphrase breaks the binding to its source.

Five mechanisms, each independently falsifiable.

The retrieval & runtime substrate.

Properties, not optimizations.

Published, and measured end to end.

The echo control, in the table.

What the preprint does not settle.

One binary, three runtimes.

Twelve metrics. Explicit thresholds for failure.

Multi-model triangulation. Trunk-based. Open by license.

How a query becomes an audited answer.

Read it. Run it. Cite it.

Verifiable provenance for grounded retrieval.

Paraphrase breaks the binding to its source.

Five mechanisms, each independently falsifiable.

The retrieval & runtime substrate.

Properties, not optimizations.

Published, and measured end to end.

The echo control, in the table.

What the preprint does not settle.

One binary, three runtimes.

Twelve metrics. Explicit thresholds for failure.

Multi-model triangulation. Trunk-based. Open by license.

How a query becomes an audited answer.

Read it. Run it. Cite it.

Verifiable provenance
for grounded retrieval.