Featured image of post decentweb http bridge

decentweb http bridge

using a browser as a client to the decentweb requires a bridge

DecentWeb HTTP Bridge — Gap Analysis and Design

Version: 0.2 Status: Draft Purpose: Review the current decentweb implementation as a foundation for an HTTP bridge that any browser can use to read and publish content on the DecentWeb network. Identify gaps, misalignments, and concrete changes needed to produce a usable browser-accessible node.

Companion document: decentweb-design.md (protocol goals and layer definitions).


1. What the Current Implementation Provides

https://github.com/diatribes/decentweb

The implementation is a DHT node daemon (dwd) and a CLI tool (dw), written in C with Bazel. It covers the protocol stack reasonably well at the low level:

What worksWhere
Ed25519 keypair generation and storagedw keygen, src/identity
btpk magnet address generation and parsingdw_identity_magnet, dw_identity_parse_magnet
BEP-44 mutable item publish and resolvesrc/feed, dwd
DHT node with routing-table persistencedwd, libdht
Content bundle packing, hashing (SHA1), parsingsrc/content/bundle.c
Feed manifest packing and parsingsrc/content/manifest.c
Hash-verified fetch from a content cachesrc/content/transport_http.c
Full write path: dw publishapps/dw
Full read path: dw getapps/dw, dwd get mode
Unix control socket IPC (RESOLVE, PUBLISH)dwd
Feed liveness: periodic BEP-44 re-announcedwd re-publish loop

What the implementation does not provide is any HTTP interface that a browser can talk to. srv serves raw files out of a docroot by hash — it is the local content cache half of the bridge — but there is no request handler that takes a btpk magnet from the URL, resolves it over the DHT, downloads the bundle, and returns sanitized HTML to a browser. There is no feed reader view, no subscription management, and no HTML sanitization pipeline. The gap between “protocol node plus local cache” and “HTTP bridge” is that connecting request-handling layer.


2. The Intended Request Flow

A browser request through the bridge looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Browser
  │  GET /magnet:?xs=urn:btpk:<pubkey>   (or /<hash>)
Bridge HTTP server (srv + request handler)
  │  check local docroot cache
  │  if miss:
  │    RESOLVE <pubkey> → dwd → DHT → manifest hash
  │    fetch manifest from network → verify hash → store in docroot
  │    fetch bundle from network → verify hash → store in docroot
Content sanitization pass
  │  strip JS, remote resources, disallowed file types
Browser receives clean HTML (zero outbound requests)

The bridge is a local caching proxy between the browser and the DecentWeb network. It fetches and verifies content on behalf of the browser, stores it locally in its docroot, and serves sanitized HTML. The browser only ever talks to the bridge over HTTP; it never touches the DHT or the content network.

Content that has already been fetched is served directly from the local docroot without hitting the network again. The docroot is the bridge’s durable content store: it survives restarts and grows as new content is fetched.

Publishers have no HTTP server requirement in this model. A publisher runs dwd to participate in the DHT and push BEP-44 updates. Readers’ bridges discover and fetch the content. The publisher’s hosting obligation ends at DHT participation.


3. Design Goal Alignment

The following table assesses each non-negotiable design goal from decentweb-design.md against the current state and flags where the bridge must take care.

3.1 Core Properties

GoalCurrent stateBridge concern
Zero outbound requests from contentNot enforced. Bundles are written to disk as-is; raw HTML may contain external src/href.The bridge must sanitize all HTML before serving it to a browser. Without this the goal fails the moment a browser renders a page.
No hosting cost for publishersMet by the model: publishers only need DHT access; the reader’s bridge caches content locally.The current fetch transport requires an existing HTTP cache to retrieve content from. Until direct DHT/BitTorrent piece transfer is implemented, a bootstrap mirror is still needed. This is an interim implementation gap, not a design flaw.
No central indexMet at the protocol layer (DHT).The bridge’s local SQLite index is per-instance and not shared — this is correct.
Content integrity without CAsMet: each fetch is hash-verified; the feed is Ed25519-signed.The bridge should surface this to the browser (e.g. a visible “signature verified” indicator on each post).
Natural content expiryMet by the model: DHT liveness and swarm seeder attrition handle expiry. Locally cached content in docroot persists until the operator clears it.The bridge cache is the operator’s own machine; operator controls retention. This is the right place for the decision.
Format simplicityBundle packing accepts any file names; no format enforcement at pack or serve time.The bridge must refuse to serve or must transcode any file outside the allowed set (HTML, CSS, WebP, WebM, WOFF2).
No required old-web dependencyPartially violated in the current implementation: dw get requires an HTTP mirror URL to be passed on the CLI. The btpk magnet alone is not sufficient to fetch content today.Direct DHT/BitTorrent download is the goal. The mirror bootstrap is a known interim step, not a permanent dependency.

3.2 Secondary Goals

GoalCurrent stateBridge concern
Author identity portabilityKeypair is a local file; no server-side custody.A bridge with a publishing flow needs encrypted server-side key storage and a key-export mechanism.
Reader anonymity by defaultNo accounts needed to read.The bridge server itself observes what content the user fetches. This is inherent to the bridge model (the bridge is the swarm peer, not the browser). It must not log reading behaviour unnecessarily, and this limitation should be surfaced in the UI.
Graceful degradationA feed that goes offline simply fails to resolve. No broken-link equivalent yet.The bridge should show the last-known feed state with a “last fetched at X” indicator rather than a bare error.

4. Gaps in the Current Implementation

The following are missing from the current implementation entirely and are required for a usable browser-facing bridge.

4.1 No request handler connecting the browser to the DHT

srv serves files from docroot by hash. dwd resolves and publishes BEP-44 items. Nothing connects them in response to a browser request. The bridge needs a request handler that:

  1. Accepts a btpk magnet or bundle hash in the URL path.
  2. Checks the local docroot cache.
  3. On a cache miss, asks dwd to resolve the feed, fetches the manifest and bundle, verifies hashes, stores results in docroot.
  4. Runs the content sanitization pass.
  5. Returns rendered HTML to the browser.

This is the bridge’s core missing component. Everything else builds on top of it.

4.2 Single post per feed

The current manifest schema:

1
{ "dw": "manifest", "v": 1, "content": <20-byte bundle hash>, "mirrors": [...] }

A feed has exactly one content bundle. There is no post history, no title, no timestamp, no follows list. A feed reader model — a list of posts per feed, unread counts, a post view for each — cannot be implemented on top of this schema.

This is the most fundamental schema gap. Nothing above the protocol layer works as a feed reader until a feed can hold multiple posts.

4.3 No manifest metadata

The manifest has no author-provided metadata: no feed title, no post titles, no publication timestamps, no description. The bridge would have no content to display in a feed list or post list beyond raw hashes.

4.4 No subscription persistence or background refresh

There is no subscription list and no background polling loop. dw get is a one-shot command. A browser user has no way to see new content appear without manually running CLI commands. The bridge needs a background thread that periodically resolves each followed feed’s BEP-44 item, fetches new bundles, and updates the local index.

Layer 4 (discovery) is a stub. There is no SQLite database, no FTS5 index, and no subscription store. Search across fetched content and the feed/post list views both require a persistent local store.

4.6 No content sanitization pipeline

Before serving any bundle HTML to a browser, the bridge must:

  • Strip all <script> tags and event handler attributes (on*)
  • Strip or rewrite all src and href attributes pointing to remote URLs
  • Validate CSS against the permitted property set
  • Reject or strip file types outside the permitted set (HTML, CSS, WebP, WebM, WOFF2)

None of this exists. dw_bundle_extract writes files to disk unchanged. Serving raw bundle HTML to a browser would allow any external src attribute in the content to make outbound requests, directly violating the core design guarantee.

4.7 No key management for the publishing flow

Key management is entirely manual: generate a file, keep it, pass its path on the CLI. For a bridge with a browser-based publish UI, this is not workable. The bridge needs encrypted server-side key storage (a key derived from a user passphrase protecting the Ed25519 seed in SQLite) and a key-export flow so users can back up and migrate their identity.

4.8 No btpk address display or QR code

The current implementation prints the magnet string on stdout. The bridge needs to present the user’s btpk address as a QR code and a copyable link, and to accept a pasted or scanned magnet on a subscribe/discover page. QR generation does not require JavaScript; a server-side generator can emit an inline SVG.

4.9 No multi-mirror fallback in the read path

dw get takes exactly one mirror URL on the command line. The manifest’s mirrors list is used only as a fallback in a limited sense. The bridge should try each mirror in the list in order before reporting failure. When the bridge publishes a feed it should add its own accessible URL to the manifest’s mirrors list so other bridge instances can retrieve cached content from it.


5. Implementation Choices That Do Not Align With the Design Goals

5.1 btpk encoding: hex vs. base32

identity.h (lines 3–8) documents the tension explicitly:

The DecentWeb article says “base32” but also claims BEP-46 compatibility, and BEP-46 uses hex, so we use hex.

decentweb-design.md (§3, §7) uses magnet:?xs=urn:btpk:<base32-encoded-public-key>. BEP 46 (the draft that introduced btpk) specifies base32. Standard BitTorrent clients (libtorrent-rasterbar and derivatives) produce and expect base32 btpk magnets. A 64-character hex key and a 52-character base32 key are not the same URL — a bridge using hex cannot exchange addresses with any client that follows BEP 46 as written.

The implementation should switch to base32. The comment in identity.h should become the resolved decision record, not an open note.

5.2 Content fetch requires an existing HTTP cache

The current transport:

1
2
BEP-44 DHT → manifest infohash → HTTP GET manifest from cache server
                                → HTTP GET bundle from cache server

The cache server (srv) is the bridge’s own docroot server — there is no publisher hosting burden here. But to do the first fetch of any content that is not already in any accessible cache, the bridge currently has no mechanism: it cannot retrieve content directly from the DHT/swarm when no HTTP cache has it yet. This is the gap that BitTorrent piece transfer fills. Until then, the first fetch of any new content depends on an accessible peer that is already serving it over HTTP.

The transport interface is pluggable (dw_content_fetch_http is one implementation). Adding a BitTorrent transport behind the same seam is the correct next step and is the only path to making the “no required old-web dependency” goal fully true.

5.3 SHA1 as the content hash

Bundle and manifest hashes are SHA1 (20 bytes), consistent with BitTorrent’s infohash width and BEP-44’s value field. This is not a correctness problem today, but SHA1 collision attacks exist in practice. A content network whose trust model rests entirely on hash-equality verification is worth moving to a collision-resistant hash (SHA-256, BLAKE3) before the wire format is finalised. This is a breaking change and needs coordination with the spec documents. Truncated SHA-256 (first 20 bytes) fits the BEP-44 value size constraint.

5.4 The Unix control socket protocol is too narrow

The current IPC protocol accepts two commands:

  • RESOLVE <pubkey-hex>OK <hash> / NONE / ERR
  • PUBLISH <keyfile> <hash>OK / ERR

A bridge request handler needs substantially more from dwd:

  • Subscribe to a feed (persist it, begin polling)
  • Unsubscribe
  • List subscriptions with status
  • Fetch a specific bundle hash from the network into docroot
  • Trigger an immediate feed refresh
  • Report DHT ready state, peer count, uptime

The current protocol is a single line in each direction into a fixed buffer — not framed, not versioned, no structured data. It would need to be replaced with newline-delimited JSON or a small HTTP/1.1 API over the Unix socket before a bridge request handler can drive it.

5.5 Maintained-feeds list is not persisted

dwd keeps a linked list of feeds to re-publish (g_feeds) in memory only. A restart loses it, and any feed that was maintained expires from the DHT roughly two hours later. A bridge serving real users must survive restarts without silently expiring its published feeds from the network.

5.6 Whole bundles held in memory

dw_bundle_parse and dw_manifest_parse read the entire payload into memory and hold it as a bencode tree. For CLI use this is fine. For a bridge serving concurrent requests, large bundles will exhaust memory. The bridge should enforce a maximum bundle size, reject manifests where the referenced hash arrives with a Content-Length exceeding the limit, and stream to disk rather than buffering fully before verification.

5.7 IPv4-only DHT

dwd creates an IPv4-only UDP socket and skips IPv6 peers. IPv6 DHT participation (BEP-32) is standard on modern public BitTorrent networks. A bridge node that cannot participate in the IPv6 DHT is a second-class peer with reduced reach.


6. HTTP Bridge Architecture

Given the current implementation, the bridge adds one new component (dwh, or an embedded HTTP handler in dwd) between srv/docroot and the browser:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
┌───────────────────────────────────────────────────────┐
│  BROWSER (any device, no JS required)                 │
│  HTTP to bridge  (GET /magnet:... or GET /<hash>)     │
└───────────────────┬───────────────────────────────────┘
                    │ HTTP
┌───────────────────▼───────────────────────────────────┐
│  dwh  (new: bridge request handler + HTML server)     │
│                                                       │
│  ┌───────────────────────────────────────────────┐    │
│  │  Request handler                              │    │
│  │  parse btpk magnet or hash from URL           │    │
│  │  check docroot cache                          │    │
│  │  on miss: ask dwd to resolve + fetch          │    │
│  └──────────────────┬────────────────────────────┘    │
│                     │                                 │
│  ┌──────────────────▼────────────────────────────┐    │
│  │  Content sanitization pipeline                │    │
│  │  strip JS / remote resources / bad file types │    │
│  └──────────────────┬────────────────────────────┘    │
│                     │                                 │
│  ┌──────────────────▼────────────────────────────┐    │
│  │  Lua server pages (HTML templating layer)     │    │
│  │  Feed list / Post list / Post view / Search   │    │
│  │  Discover / Publish / Profile + QR code       │    │
│  └──────────────────┬────────────────────────────┘    │
│                     │                                 │
│  ┌──────────────────▼────────────────────────────┐    │
│  │  SQLite                                       │    │
│  │  feeds, posts, fts_index, keypairs, subs      │    │
│  └───────────────────────────────────────────────┘    │
└───────────────────┬───────────────────────────────────┘
                    │ Unix socket (extended IPC)
┌───────────────────▼───────────────────────────────────┐
│  dwd  (existing daemon, extended)                     │
│  DHT node, BEP-44 publish/resolve                     │
│  Feed liveness / re-announce loop                     │
│  Content fetch into docroot (HTTP now, BT swarm next) │
└───────────────────┬───────────────────────────────────┘
┌───────────────────▼───────────────────────────────────┐
│  docroot  (local content cache)                       │
│  srv serves these files by hash to dwd and to dwh     │
└───────────────────┬───────────────────────────────────┘
                    │ DHT / HTTP mirrors / (BitTorrent)
┌───────────────────▼───────────────────────────────────┐
│  DECENTWEB NETWORK                                    │
└───────────────────────────────────────────────────────┘

dwh is a new process that owns the browser-facing HTTP server, SQLite, and the sanitization pipeline. It drives dwd via an extended IPC protocol. dwd remains the DHT peer and is responsible for fetching content into the shared docroot. srv continues to serve the docroot by hash — used both by dwd (as a peer-facing mirror for other bridge instances) and by dwh (as the local cache read path).


7. Prioritised Recommendations

Changes ordered by impact on a browser user’s experience. Items in the same tier can be done in parallel.

Tier 1 — Prerequisites: nothing useful works without these

1a. Extend the manifest schema for multiple posts

The manifest must support a list of posts before a feed reader can exist. Proposed minimal extension (backward-compatible: a manifest without posts falls back to the current single-content behaviour):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
  "dw":      "manifest",
  "v":       2,
  "title":   "Feed display name",
  "posts": [
    {
      "hash":      <20-byte bundle hash>,
      "title":     "Post title",
      "timestamp": <unix epoch integer>,
      "summary":   "Optional one-line excerpt"
    },
    ...
  ],
  "follows": [ <pubkey-hex>, ... ],
  "mirrors": [ "http://...", ... ]
}

The content field should be kept for v1 read compatibility and deprecated in v2. follows enables the trust-graph discovery path (Layer 4).

1b. Decide and commit to a btpk encoding

Pick base32 (spec-correct, BT-client-interoperable) or hex (current) and change everything to match. The comment in identity.h should become the canonical decision record. Recommendation: base32, for interoperability with any client that implements BEP 46 as written.

1c. Add a content sanitization pass

Before serving any HTML to a browser, the bridge must:

  1. Parse the HTML (a minimal recursive-descent parser over the permitted element set is sufficient — a full DOM is not needed).
  2. Strip <script>, <iframe>, <object>, <embed>, remote <link rel="stylesheet">, and all on* attributes.
  3. Rewrite <a href="..."> for external destinations to open in a new context, with a visible “external link” marker.
  4. Strip src="http..." attributes on <img>, <video>, <source>.
  5. Reject files with extensions outside {.html, .css, .webp, .webm, .woff2}.

This is the enforcement mechanism for the “zero outbound requests from content” guarantee.

Tier 2 — Core UX: a usable feed reader

2a. Add SQLite for state

The bridge needs at minimum: feeds (pubkey, display title, last-fetched), posts (feed key, bundle hash, title, timestamp), fts_index (FTS5 virtual table over post content), and keypairs (for publishing users, encrypted with a passphrase-derived key).

2b. Background feed refresh loop

A thread in dwh that:

  1. Reads the subscribed feed list from SQLite.
  2. On a configurable interval (default 15 minutes), resolves each feed’s BEP-44 item.
  3. Fetches the manifest and any new bundle hashes not yet in the local post store.
  4. Updates the SQLite index.

Until this exists, the bridge shows a static snapshot that never updates.

2c. Persist the maintained-feeds list in dwd

dwd should write its maintained-feeds list to dwd.state alongside the routing table and restore it on startup. A restart must not silently expire published feeds.

2d. Extend the IPC protocol

Replace the current single-line protocol with newline-delimited JSON. New commands the bridge needs:

CommandPurpose
SUBSCRIBE <pubkey>Add feed to polling list
UNSUBSCRIBE <pubkey>Remove feed
LIST_FEEDSReturn subscribed feeds with last-resolve status
FETCH_BUNDLE <hash>Fetch a bundle into docroot by hash
STATUSReturn DHT ready state, peer count, uptime

Tier 3 — Quality and completeness

3a. Serve the feed reader as clean, no-JS HTML

All bridge pages must be complete server-rendered HTML. No JavaScript dependency for any core function. Minimum views:

  • Feed list (sidebar): followed feeds with unread post count
  • Post list: most recent posts from selected feed or all feeds
  • Post view: sanitized bundle HTML, served inline
  • Search results: FTS5 query across indexed post content
  • Discover: paste or type a btpk magnet to subscribe
  • Publish: compose a post under a managed keypair
  • Profile: user’s btpk address as QR code and copyable link

3b. btpk address as QR code

A server-side QR generator outputs an inline SVG or WebP. The profile page shows the user’s btpk magnet as a QR code and as selectable text. The discover page accepts a pasted or typed magnet and begins a subscription.

3c. Enforce a bundle size limit

Define a maximum bundle size (suggested: 50 MB) and enforce it at both fetch time (reject before reading the body if Content-Length exceeds the limit) and pack time (dw bundle). This prevents memory exhaustion in both dwd and dwh.

3d. Full multi-mirror fallback

When fetching a bundle or manifest, try every mirror in the list before reporting failure. When the bridge publishes a feed, add its own accessible URL to the manifest’s mirrors list so other bridge instances can retrieve cached content from it.

Tier 4 — Correctness and reach

4a. BitTorrent piece transfer transport

The HTTP transport is an interim measure that requires a peer already serving the content over HTTP. Direct BitTorrent download removes this dependency and completes the “no required old-web dependency” goal. The pluggable transport seam in content.h already exists for this purpose.

4b. Add IPv6 DHT support

Pass AF_INET6 alongside AF_INET to the libdht node. Create a second UDP socket for IPv6 and run both in the select loop. Necessary for full DHT reach.

4c. Re-evaluate SHA1 as the content hash

Before the manifest schema is finalised, consider moving bundle and manifest hashes to SHA-256 or BLAKE3. This is a breaking wire change and needs coordination across the spec documents.


8. What the Bridge Does Not Change

The following constraints from decentweb-design.md are architectural. The bridge can honour or violate them but cannot alter them:

  • Read-time privacy is a bridge-side concern. The bridge is the swarm peer; it knows what the user fetches. This is inherent to the proxy model. The bridge must not log reading behaviour unnecessarily, and this limitation should be visible in the UI.
  • Publisher identity is non-transferable. The bridge may manage key material on behalf of a user, but the key remains the canonical identity. Key backup and export are required features, not optional.
  • Published is public. Once a bundle hash appears in a signed manifest and reaches the DHT, it cannot be retracted. The publish flow in the bridge UI should make this explicit before submission.
  • No first-contact guarantee. When a user subscribes to a btpk address for the first time, the bridge cannot verify who controls it. The UI should show key continuity (first-seen date, unbroken signature chain) as the available trust signal, not a false “verified” indicator.

9. Open Questions

  1. Language stack: resolved. The bridge is C99 throughout. Lua (via LuaJIT) is the templating layer for dynamic HTML responses — Lua server pages handle feed reader views, post rendering, and form responses, while C handles the protocol, DHT, content fetching, sanitization, and SQLite. No other language is in scope.

  2. Single binary vs. two processes. The bridge could be dwd extended to embed an HTTP server, or a separate dwh that talks to dwd over the socket. Two processes allows independent restarts but adds deployment complexity. Single-process is simpler to deploy but couples the HTTP server to the DHT node’s restart cycle.

  3. Multi-user vs. single-user bridge. For a self-hosted instance, single-user simplifies key storage and removes isolation concerns between users. The SQLite schema should accommodate multi-user from the start (a users table with FKs into keypairs and subscriptions) even if the first implementation only creates one user.

  4. Markdown vs. HTML authoring. Recommendation: convert Markdown to HTML at publish time so stored bundles are always canonical HTML. Optionally preserve the Markdown source as source.md in the bundle for the bridge’s edit flow.

  5. Key backup prompt timing. When should the bridge prompt the user to download their key backup — on first login, on first publish, or both? A user who never backs up and loses access to the bridge loses their publishing identity permanently.


10. Version History

VersionDateNotes
0.12026-06-04Initial draft. Gap analysis against design doc; bridge architecture; prioritised recommendations.
0.22026-06-04Removed SaaS stack references; corrected transport framing (bridge cache is reader-side; no publisher hosting burden); added explicit request flow diagram; reframed design goal alignment accordingly.
0.32026-06-04Language stack resolved: C99 core, LuaJIT server pages for HTML templating. No C# or ASP.NET.
Built with Hugo
Theme Stack designed by Jimmy