A provenance layer, not a brain. Hyphae answers by emitting byte-identical quotations of stored fragments over a SHA-256 hash-chained journal — so every span is auditable back to a named, unaltered source. An external Ed25519 anchor closes the tampering gap. No large language model sits in the cognition path.
When a language model paraphrases a retrieved passage, the output can no longer be bound to its source byte-for-byte. The wording drifts; the citation becomes a gesture, not a guarantee. For grounded retrieval — where the whole value is that an answer is traceable to a real, unaltered source — that drift is the failure mode.
Hyphae's answer is deliberately narrow: emit the source verbatim, write it over a hash-chained journal, and let anyone audit the binding independently. The hash chain itself is classical — Haber–Stornetta (1991), Merkle (1988), Certificate Transparency, git. The contribution is the application to grounded retrieval, and the measurement.
A trivial echo baseline — a few lines that just print the retrieved fragment back — ties Hyphae on every correctness and grounding metric. So does echo + journal. That is the point, not a weakness: correctness and grounding are properties of verbatim quotation, not of any one system. Provenance is an addable layer any extractive retriever can adopt. Hyphae is a reference realization of it.
A paraphrase can't be bound to its source byte-for-byte. A verbatim quotation can — and a hash chain makes that binding independently auditable. Hash-Chained Verbatim Quotation · the contribution in one line
Verifiable provenance is built from five layered mechanisms. Each can be tested on its own, and each can fail without invalidating the others. Together they make every answer span auditable back to a named, unaltered source — and they hold against an attacker who knows the chain is there, rewrites it, rolls it back, withholds entries, or steals a retired key.
Every answer span is a verbatim copy of a stored fragment — no paraphrase, no rewording. The output is bindable to its source byte-for-byte, which is exactly what an LLM's paraphrastic generation destroys.
Fragments and emissions are written to an append-only journal where each entry commits to the previous entry's hash. Store-only tampering across ten modes is detected and localized. The chain is classical; the application is the point.
A bare chain falls to a chain-aware adversary who rewrites history and re-hashes. An Ed25519 signature over the chain head — key held outside the store process — closes that gap and is caught when the rewrite is attempted.
Heads are published to an append-only, hash-chained ledger — so a rollback replaying a stale anchor is rejected (freshness) and forked histories are caught (non-equivocation). An independent witness of the ledger tail catches a store that withholds entries.
A keyring rotates the anchor key — each successor authorized by its predecessor from a root trusted out-of-band. A ledger spanning rotations verifies under the per-epoch key, and a retired key can sign no new history, so key compromise is recoverable, not fatal.
Hyphae runs on a coordinated set of subsystems modeled on mammalian brain regions, communicating through typed pathways. They handle retrieval, state and runtime coordination — fetching candidate fragments, tracking conversation, managing the journal.
They are not what makes a quoted answer correct — verbatim emission is. The echo control proves it: a few lines that print the retrieved fragment back score the same. Read this section as the substrate that retrieves and persists, not as the source of the answer's quality.
First filter on incoming signal. Decides what reaches the substrate.
Stores fragments and runs pattern completion at initial retrieval.
Binds fragments and computes the conductivity weights on the retrieval network.
Assigns affective valence to fragments at write time and on recall.
Modulates precision and gates consolidation alongside the BNST.
Bounded working memory (7 fragments). Controlled inhibitory filtering of retrieval results.
Detects conflict, tracks threads, open questions and conversation chronology.
Predictive coding and spreading activation through the retrieval network.
Indexes fragments and threads on the temporal axis.
Decides which fragments graduate from working memory to long-term storage.
Salience model for input streams. Future foundation for multimodal extension.
Decides when to fire a curiosity operation against external grounding.
Tracks substrate-internal signals — load, latency, integrity, drift.
Selects among candidate emissions and tool invocations under reward.
Picks the emission schema (DialogueReply, GroundedAssertion, etc.) for the intent at hand.
Byte-identical fragment quotation + minimal connective tissue. Slot binding with depth-two backtracking.
SHA-256 hash chain over every significant event, with external Ed25519 head anchor. Verified during Recovery state.
The properties below are what make a Hyphae answer independently auditable. They are claims about verifiability, not about answer quality — the echo control already settled the quality question.
Every answer span is a byte-identical copy of a stored fragment, so the output binds to its source exactly. This is the property paraphrastic LLM generation destroys — once wording drifts, no span can be matched to a source byte-for-byte.
Each emitted span carries provenance back to a named, unaltered source fragment. A third party can verify the binding without trusting the system — the audit does not depend on Hyphae attesting to its own honesty.
The SHA-256 hash chain over the journal detects and localizes store-only tampering across the benchmark's ten modes — edit, delete, insert, reorder, bit-flip, truncate, duplicate, timestamp-skew, rollback, batch. The chain is classical (Haber–Stornetta, Merkle, Certificate Transparency); the contribution is applying it to grounded retrieval and measuring it.
A bare chain falls to an adversary who rewrites history and re-hashes the whole chain. An external Ed25519 signature over the head — key held outside the store process — closes that gap and catches the rewrite.
Above the bare chain: an append-only Ed25519 ledger of signed heads gives freshness (a rolled-back head with a stale anchor is rejected) and non-equivocation; an independent witness of the ledger tail catches a store that withholds entries; and a signed keyring rotates the anchor key so compromise is recoverable, not fatal. Each layer is measured.
The cognition path is deterministic over structured inputs — no model inference between query and answer. Rust, CPU-only, single binary; fits on commodity hardware. LLMs appear only as external comparators in the evaluation, never in the emission path.
The preprint is public and citable (Zenodo DOI). The provenance stack is built and verified in the open: a hash-chained journal, an external Ed25519 head anchor, an append-only ledger (freshness + non-equivocation), an external witness (against entry withholding), and a signed keyring (key rotation). A community-scale provenance benchmark measures the whole stack, including a defense-escalation experiment where each attack is caught by exactly the next layer up. All in the public repository, CI-gated.
The evaluation spans 255 queries, twelve metrics and 18 LLM-based configurations — six models × three retrieval modes — across two corpora, plus a dedicated tamper-detection benchmark. The headline finding is below, and it is deliberately not a win: on correctness and grounding, a trivial echo baseline matches Hyphae. The metric that actually separates systems is provenance.
255 queries evaluated across two distinct corpora, twelve metrics each, with explicit failure thresholds per metric.
Eighteen LLM-based comparator configurations: six models, three retrieval modes each. LLMs appear only as comparators — never in Hyphae's emission path.
Two trivial baselines that print the retrieved fragment back. They tie Hyphae on correctness and grounding — that is the contribution, stated honestly.
The provenance benchmark (provbench v2) crosses ten tampering modes with three adversary profiles, plus a defense-escalation experiment: each attack — in-place edit, chain-aware rewrite, rollback-with-stale-anchor, withholding — is caught by exactly the next layer up (chain → anchor → ledger → witness).
Correctness and grounding are properties of verbatim quotation — every quoting system ties. Provenance (the chain + the external anchor) is what Hyphae adds on top.
The honest position includes its own limitations. These are the genuine open questions carried in the paper's future-work section — stated plainly rather than buried.
Verbatim quotation fits questions whose answer is a contiguous span in some source; it cannot synthesize across sources. We have begun mapping this boundary with a multi-hop harness: a single-span system silently fails on multi-hop by default, and graceful degradation (abstaining) is achievable but requires an explicit abstention signal the realizer must implement. The live multi-hop column against an LLM comparator is the next measurement.
From the journal onward the threat model is now closed: the chain catches store-only edits, the anchor the chain-aware rewrite, the ledger adds freshness and non-equivocation, a witness catches withholding, and a signed keyring makes key compromise recoverable — measured across ten modes and three adversaries. What remains genuinely open is the ingestion boundary: attesting that fragments entered the journal faithfully in the first place. Provenance from the journal forward is closed; provenance into it is not.
The work is public now on Zenodo with a citable DOI. An arXiv version is forthcoming, pending category endorsement — we will link it here when it lands. Until then, the Zenodo record and the public repository are the authoritative artifacts.
Hyphae compiles to a single Rust binary. The deployment configuration is what differentiates a portable offline-only runtime from a learning runtime that absorbs new fragments through curiosity — and from an edge variant tuned for constrained hardware.
Operates entirely on the fragment store loaded at startup. Curiosity disabled, all budgets capped to zero, no Vertex credentials, no network access.
Same code, different runtime configuration. Curiosity continuously active, Vertex grounding online, learning loop refining weights from feedback, web agency enabled.
Minimal substrate for embedded devices and companion modes. Reduced subsystems, pre-trained fragment store, read-only runtime — the smallest shape of Hyphae that still preserves the substrate's contract.
The evaluation specifies twelve typed metrics with explicit floors — the points at which a provenance claim is empirically falsified. The provenance benchmark (provbench v2) runs ten tampering modes against three adversaries, plus a defense-escalation experiment across the full stack — chain → anchor → ledger → witness.
Failure is information, not termination. If a threshold trips, it is reported honestly — the echo control is itself an example of a result that contradicted the original framing and was published rather than hidden.
If an emitted span is not a byte-identical copy of its source fragment, the core guarantee — bindability to source — is broken.
Every emitted span must resolve to a named source fragment. A single span without provenance is a coverage failure.
Edit, delete, insert and reorder against a store-only adversary must each be detected and localized by the hash chain.
A chain-aware adversary who re-hashes the whole journal must be caught by the external Ed25519 head anchor; a miss falsifies the anchor.
If Hyphae claimed to beat echo on correctness or grounding, that would be the over-claim the paper retracted. Parity with echo is the expected, honest result.
The cognition path must be deterministic over its inputs; a non-reproducible emission undermines the auditability the whole layer depends on.
Architectural decisions and pattern-establishing implementations receive review from deepseek-v4-pro and gemini-3.5-flash before commitment. The triangulation has caught real defects pre-merge. Tests required for every PR; cargo fmt, clippy as errors, build & test for the workspace must pass before merge.
The honesty discipline is enforced structurally: the hash-chained journal at the data layer, provenance metadata on every fragment, and a published echo control that contradicted the project's own earlier framing — kept in the paper rather than cut.
Open source under Apache 2.0 for code and CC-BY-4.0 for docs, corpora and the preprint. Code, the LLM+RAG comparator, every result envelope, and the tamper-detection experiment are public — a provenance claim that can't be independently re-run isn't a provenance claim.
The retrieval-and-emission path, end to end: input gating, fragment retrieval, working-set assembly, verbatim emission, and the journal write that makes the result auditable. No model inference sits between the query and the answer.
Runtime topology · provenance path
Synchronous path · single request · no LLM in cognition
Feedback signal — explicit and implicit — refines the substrate's parameters, not its structure. Every weight update is journalled with rollback capability.
ACC detects a causal gap → the Dopaminergic Midbrain evaluates depth + relevance + recency → curiosity fires through one of three channels.
Triggered when LC arousal and BNST valence are both low — Hippocampus replays, episodic → semantic abstraction, SHY proportional decay, conductivity graph compaction.
~/.hyphae/ ├── journal/ fjall LSM-tree + SHA-256 hash chain (immutable history) ├── state/ redb (state machine + counters) ├── fragments/ postcard binary (sharded, conductivity-indexed) ├── lexicon/ postcard binary (multilingual, in-memory cached) ├── learning/ postcard binary (weight updates with rollback chain) ├── decisions/ ADRs in markdown (architectural decision records) └── exports/ JSON on-demand (debugging, migration, audit)
Persistent memory accessed by retrieval, not by inclusion in a prompt.
Composition is deterministic over structured inputs; LLMs only consult external grounding.
Threads, open questions and pending follow-ups tracked as first-class system state.
Every fragment carries provenance with an explicit confabulation_risk.
Journal hash chain — tampering is detectable, recovery state verifies integrity.
CPU + RAM. No GPU dependence at any tier — including edge / IoT.
Fragments preserve original content. The realizer only generates connective tissue.
Every weight update is journalled with a rollback chain. Structure is never mutated.
Explicit acknowledgment when working material is insufficient — typed triggers, not a generic apology.
Rust substrate, the LLM+RAG comparator, every result envelope, the tamper-detection experiment, and the full preprint — all public, dual-licensed Apache-2.0 / CC-BY-4.0.
Gutiérrez, M. (2026). Hash-Chained Verbatim Quotation: A Verifiable Provenance Layer for Grounded Retrieval. Zenodo. https://doi.org/10.5281/zenodo.20436643
@misc{gutierrez2026hyphae,
author = {Guti{\'e}rrez, Mario},
title = {{Hash-Chained Verbatim Quotation: A Verifiable
Provenance Layer for Grounded Retrieval}},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.20436643},
url = {https://doi.org/10.5281/zenodo.20436643},
note = {Hyphae v2. Code, corpora, result envelopes, and
the preprint. \url{https://github.com/terrizoaguimor/hyphae-v2}}
}