Every item below has a slug. Sponsorship conversations start with the slug, not with a brand deck. The roadmap is honest about what's shipped, what's actively being built, what's paused waiting for compute, and what we will categorically not build — regardless of the cheque.
We don't plan in quarters. We don't write OKRs. Each new piece earns its place by becoming the obvious next part — once the layer underneath proves itself in production, the layer above becomes possible to articulate.
That means the roadmap below is honest about order but agnostic about date. Items move from Active to Shipped when the predecessor stabilizes, not when a sprint closes.
Every item here has a public surface — repository, package, or release — you can verify against the claim.
61 typed MCP tools, hash-chained journal, 4-layer ethics engine, hybrid retrieval over PAD + semantics. Apache 2.0 in full.
Auto-recall, auto-capture, auto-journal, 5-layer ethics gate, subagent lineage, operator dashboard. Running in production on a single-tenant gateway. Pre-1.0, audit-hardened.
First vertical specialty available. Native vocabulary for LLM systems, agent design, retrieval, eval, and the operational layer that turns prototypes into systems people maintain.
Adapter + native, with a paper (DOI). On a frozen Gemma base, under conflict the channel overrides the text 264/265. From scratch, the perpendicular force replicates (88.8 % of 455 held-out pairs, chance 25 %), and the channel leaves the base a better language model than bracketed channel-less controls — the "relief valve". Honest scope: toy scale (110M / 1B tokens), one iteration. GPL-3.0 / CC BY-SA.
RFCs v0.1.2, v0.2, v0.3 shipped. Baseline metrics green on the 20-question smoke corpus: compose 1.000, grammaticality 0.993, honest-limitation 5/5.
If you watch the repos, these are the branches receiving commits. None are committed dates; all are committed work.
4 remaining Bucket 1 tools (journal_introspect, journal_arc, cultivate, bloom); expand eval corpus from 20 → 255 queries; replace mock Vertex grounding with the real provider.
Pre-1.0 is feature-complete and audit-hardened. The publish step is gated on explicit go-ahead and on validating multi-tenant deployment on a non-Celiums gateway.
Plan B — MARS as an adapter on top of Google's frozen 4.6 B base. This is the deployable artifact; tinyMARS was the architectural prototype. Separate forthcoming repo.
The Layer K precedent corpus (~1,857 docs, 1024-d) shipped as a release asset. Active work: making it easy to load into any OpenSearch instance, with SHA-verified bulk indexing.
Specified work waiting for its predecessor to stabilize. Each item has an articulated scope and a known integration shape, not just a name.
AlphaFold-2 incorporated physical constraints into the network; Hyphae specifies the rules of composition but learns no strategies from experience. RFC v0.4 specifies feedback signals, bounded parameter updates, audit trail, rollback if metrics degrade.
One of: medical research, legal analysis, financial modeling, engineering systems, creative writing, personal context. The chosen field is the one where a real partner is willing to ground the build in domain depth.
The MCP surface is language-agnostic but ergonomic Python and Go SDKs would lower the integration cost for teams not already in the Node/TS ecosystem.
Cognition runs proven on a single-tenant gateway. The next milestone is multi-tenant validation: RBAC and AAL behavior under contention, per-tenant ethics modes, cross-tenant isolation under load.
Ideas that have been written down — usually as journal entries pending RFC articulation — but are not yet committed to a build order. They wait for the layer below them to be obvious.
Browser automation (Playwright or equivalent) lets Hyphae read full pages, follow links, extract structured content. The concrete mechanism for the learning loop in R-HY-003.
Visual and auditory input as extensions of the Olfactory Bulb gating model and Thalamus input handling. Deferred until the textual substrate fully validates.
The deepest memory grounding — for the long-term you. Multi-year, single-user, with the strictest privacy posture in the family. Architectural sketch only.
The engine fits on commodity hardware. A profile that fits in single-digit-GB on a phone or edge device, with the same MCP surface and a reduced ethics-corpus footprint, is articulated but unbuilt.
The eval was widened (corpus v2, six capabilities) and the conflict test cleared decisively (channel overrides text 264/265); the native then replicated the perpendicular force from scratch (88.8% of 455 held-out pairs) and revealed the relief valve, with a paper and a DOI. The bottleneck is no longer statistical power — it is scale, and an efficiency-first redesign so a larger model stays affordable to train and can run on commodity hardware.
The from-scratch native proved the property at toy scale (110M / 1B tokens); it is not yet a capable model. Next is scale, but efficiency-first: knowledge retrieved rather than memorized into the weights, and sparse, conditional compute — so the cognitive channels can govern what is computed and recalled, and a larger model trains cheaply and serves on commodity hardware. Each efficiency step is its own small, pre-registered experiment before any scale-up — the same discipline that produced the conflict and relief-valve results.
Read the research log →These are the things we will not build — independently of the cheque size. The list is short on purpose. The longer one is what's on the roadmap, not what's off it.
This is the mirror of the Decline column in the funding section, applied to product, not money.
Any direction that requires hiding the engine, the journal, or the ethics layer. The Ethics Engine in particular is the one component that least deserves to be hidden.
Specialized intelligences for defense, mass surveillance, intelligence collection, or autonomous targeting. The acceptable-use policy is binding on us as builders too, not just on users.
Anything that requires phoning home to function. Users own their memories and their journals. We do not extract them, sell them, or hold them hostage to a managed plan.
We don't compete on raw inference throughput. The bet is on a different shape of cognition (substrate, ethics, journal, specialization) — not on shaving milliseconds off a chat reply.
Wrappers around third-party services that don't add substrate. If a connector doesn't extend Memory, Cognition, or the ethics surface, it doesn't justify being part of Celiums.
Email with the slug in the subject. We respond honestly within a week — including when the answer is no, and including a real cost estimate when the answer is yes. Some items are genuinely compute-bound; others — like tinyMARS now — just need eyes, replication, and collaborators.
[ Honest scope · slug-addressable · costs disclosed ]