the router and the keeper

Two binaries run the naming layer of this network. One resolves queries. The other moves files. They share a Postgres wire and a WireGuard mesh, and not much else.


pg-router: DNS from the database

pg-router is an authoritative DNS server that reads records directly from PostgreSQL. No zone files. No AXFR. The database is the zone, and every nameserver node holds a streaming replica.

The query path is short:

  1. UDP packet arrives on port 53
  2. Worker checks the sharded LRU cache, keyed on (qname, qtype, geo_region)
  3. Cache miss triggers a Postgres query against dns_geo_records first, then dns_records
  4. Response is built, cached, and sent
let key = CacheKey {
    qname: qname.clone(),
    qtype,
    geo: region,
};

if let Some(cached) = cache.get(&key) {
    return cached.to_wire();
}

let records = sqlx::query_as::<_, DnsRecord>(
    "SELECT * FROM dns_geo_records WHERE name = $1 AND type = $2 AND region = $3"
)
.bind(&qname)
.bind(qtype_str)
.bind(&region)
.fetch_all(&pool)
.await?;

This design — querying a database for every DNS answer — has an obvious problem: you cannot query a database for every DNS packet. Resolvers worldwide may ask the same question thousands of times per second. PostgreSQL was not designed for that.

So pg-router maintains a cache. A sharded LRU, 64 independent pieces, each with its own lock. When a query arrives, pg-router hashes the name and type to determine which shard to check. Two simultaneous queries for different names almost certainly hit different shards and proceed in parallel. The hash function is FNV-1a — not cryptographically secure, does not need to be, fast and even.

The most important detail of the cache is its key. Not just (qname, qtype) but (qname, qtype, geo_region). Without the region in the key, a cached US response would be served to European queries, adding two hundred milliseconds of latency to every request.

Cache entries store pre-serialized wire format — the exact bytes of the DNS response packet. On a cache hit, pg-router copies the cached bytes into the response buffer, patches in the two-byte transaction ID, and sends. A memcpy and two byte writes. This is the fast path, and it handles the vast majority of queries.

Names have trailing dots in the database. Every single time you forget this, the query returns zero rows and you spend twenty minutes checking replication lag.

the geography of names

Three nameservers run pg-router: ns-01 in Toronto, ns-02 in Amsterdam, ns-03 in Santa Clara. Each holds a streaming replica of the primary in Buffalo. Write latency to the replica is typically under 80ms over the mesh.

The geo layer is simple. Each nameserver node has a GEO_REGION environment variable — us, eu, or oc. Geo records are checked first. If none match, the query falls through to the global dns_records table. No MaxMind databases, no client subnet parsing. The GeoDNS is decided by which nameserver the resolver chose, and that is decided by anycast proximity or round-robin luck.

The topology is the routing policy. A resolver in Berlin picks ns-02 in Amsterdam. ns-02 is configured with GEO_REGION=eu. The dns_geo_records query returns the European IP. No GeoIP lookup occurred. The mere act of choosing the nearest nameserver selected the correct geographic response.

four protocols, one binary

pg-router also serves HTTP (/health, /resolve, DNS-over-HTTPS, parked domain pages), forwards SMTP (recipient validation against mail_forwards in the same database), and answers ICMP echo requests via raw socket. The same program that constructs complex DNS responses with CNAME chasing and EDNS option echoing also answers the simplest possible network question: “are you there?” “Yes.”

The UDP listener uses SO_REUSEPORT — multiple sockets on the same address and port, one per two CPU cores. The kernel distributes incoming packets across them. On a cache hit, the worker never blocks. On a cache miss, it spawns an async task for the database query and moves on to the next packet.

One process. One database connection pool. Four protocols.


consort: the immune system

pg-router answers questions. Consort ensures the conditions necessary for pg-router to answer questions continue to exist.

Consort runs on every node in the mesh. It is a single Rust binary that does four things continuously, in parallel, without being asked:

  1. Replicates files between nodes
  2. Corrects configuration drift
  3. Monitors critical system files for unauthorized changes
  4. Restarts services that fail

These are four aspects of a single concern: maintaining the invariant that each machine’s actual state matches its intended state.

content-addressed storage

Every piece of data Consort manages — a website directory, a TLS certificate, a compiled binary, a configuration file — is stored as a blob identified by its BLAKE3 hash with domain separation:

hasher.update(b"registrar.earth/consort/blob/v1\0");
hasher.update(&data);

Blobs are stored with two-level sharding: blobs/a1/b2/a1b2c3d4.... Writes are atomic — data goes to a staging directory first, then renames into place. A crash during a write leaves a temp file in staging, not a corrupt blob in the store.

peer-to-peer replication

Consort replicates blobs between nodes using Iroh QUIC. Each node derives its identity from the cluster secret and node name:

hasher.update(b"registrar.earth/consort/key/v1\0");
hasher.update(cluster_secret.as_bytes());
hasher.update(node_name.as_bytes());
let key = SecretKey::from(*hasher.finalize().as_bytes());

Deterministic key derivation. Every node can compute every other node’s public key from the shared secret and the name. No key distribution ceremony. No certificate authority. If you know the cluster secret and the node names, you know the entire key infrastructure.

When a blob is created, Consort pushes it to all peers in parallel. A deadman checker runs every five minutes, looking for blobs older than ten minutes that haven’t reached all nodes. For those, it falls back to rsync over SSH — crude but effective, because SSH has decades of workarounds for the ways networks can break.

the convergence engine

Every 60 seconds, Consort reads a per-node TOML manifest and checks every declared resource against the actual state of the machine. If they differ, it corrects the machine.

[node]
name = "buf-01"
roles = ["hub", "pg_primary"]

[[file]]
path = "/etc/caddy/Caddyfile"
blob = "a1b2c3d4..."
owner = "root"
group = "root"
mode = "0644"
on_change = "systemctl reload caddy"

[[service]]
name = "caddy"
enabled = true
running = true
health_port = 443

[[sysctl]]
key = "net.core.somaxconn"
value = "4096"

Seven resource handlers implement the same interface: check current state against spec, apply if they diverge. The file handler computes a BLAKE3 hash and compares. The UFW handler parses ufw status and adds missing rules, removes unexpected ones — any rule not in the manifest is removed. The sysctl handler reads /proc/sys and writes via sysctl -w. Each handler is small, about a hundred lines, because each does one thing.

Every correction produces a change record that replicates to all peers. Not just logging — distributed awareness.

HIDS: the sentinel

The convergence engine manages manifest resources. HIDS watches everything else that should never change: /etc/passwd, /etc/shadow, /usr/bin/sudo, SSH authorized_keys, systemd unit directories.

Every five minutes, it hashes critical paths with BLAKE3 and compares against a baseline from first run. Any change generates a warning with the path, expected hash, actual hash, and modification time.

HIDS does not auto-correct. An unauthorized change to /usr/bin/sudo is not interesting in itself — what matters is why it changed, who changed it, and what else they might have done. If you auto-revert, you fix the symptom and destroy the evidence.

The convergence engine and HIDS overlap deliberately. A file like sshd_config may appear in both the manifest and the HIDS critical path list. If something modifies it between convergence cycles, HIDS catches it first and alerts. Convergence fixes it on the next cycle. Defense in depth.

self-healing

There is a gap between “the service is running” and “the service is working.” A web server can be active according to systemd but unable to serve requests because its port is occupied.

The healing loop runs every 30 seconds. For each service with a health_port, Consort attempts a TCP connection on localhost. If it fails, the service is broken regardless of what systemd thinks.

Restart follows a backoff schedule: 5 seconds, 30 seconds, 120 seconds. After three failures, it stops and fires a webhook alert. The counter resets after 10 minutes of healthy uptime.

fleet comparison

Every 15 minutes, each node builds a summary of its resource state and stores it in PostgreSQL. The summaries replicate to all nodes. Any node can compare its summary against peers that share the same roles.

Not auto-corrected. Divergence between nodes may be intentional. But it is visible, and visibility is the prerequisite for every other kind of correctness.


the mesh underneath

Both services depend on a WireGuard mesh routed through fd53::/32. BIRD handles route distribution between peers. The topology is hub-and-spoke with three hubs — Buffalo, Los Angeles, and Lisbon — and leaves that peer through them.

Adding a node means touching every existing node: WireGuard peer config, kernel routes, BIRD neighbor statements. There is no automatic discovery. This is deliberate.

# runtime peer addition (does NOT persist)
wg set wg0 peer <pubkey> \
    allowed-ips fd53:0105:1400::/48 \
    endpoint 172.235.33.38:51820

# kernel route (wg set does NOT do this)
ip -6 route replace fd53:0105:1400::/48 \
    dev wg0 metric 10

BIRD installs unreachable routes at metric 32 that silently override WireGuard routes at the default metric of 1024. You learn this exactly once, at 2 AM, when a new node can ping its hub but nothing beyond it.

The mesh carries all inter-node traffic: Postgres replication, Consort gossip, DNS cache misses on read replicas, and the occasional SSH session when something breaks. Total bandwidth is modest. Reliability is the constraint, not throughput.


the symbiosis

pg-router and Consort depend on each other completely, despite having no awareness of each other in the source code. No import, no shared library, no IPC channel. Related by circumstance.

pg-router depends on PostgreSQL replicas being healthy, its binary being current on every nameserver, its environment file being correct, the WireGuard mesh being functional, firewall rules allowing port 53. Consort provides all of these.

Consort depends on DNS resolution, network connectivity between nodes, the PostgreSQL database for metadata. pg-router provides DNS. WireGuard provides the network. PostgreSQL provides the database. And Consort ensures that all three keep running.

A virtuous circle. Remove either, and the system degrades. Remove both, and you have ten Linux machines with no coordination.


the alert pipeline

Every correction, detection, and comparison produces alerts that flow through a unified pipeline — stored in PostgreSQL, replicated to all peers, optionally pushed to a webhook.

Alert TypeSeverityMeaning
drift_correctedinfoA managed resource was auto-corrected
hids_changewarningA critical path hash changed
service_restartedinfoSelf-healing restarted a service
service_unhealthywarningRestart attempts exhausted
cross_node_divergenceinfoRole peers have different state

The dashboard renders these alongside verification data: DNS resolution, PostgreSQL replication, HTTP endpoints, TLS certificates, WireGuard handshakes, BIRD BGP peering, and Nuclei security scan results.


Two binaries cross-compiled from the same workspace, deployed by Ansible, connected by encrypted tunnels over commodity VPS nodes in six countries. No Kubernetes. No service mesh. No cloud provider API calls. Just WireGuard, Postgres, and Rust.


March 2026