Scalability & Horizontal Scaling
Diminuendo’s architecture was designed from the outset to make horizontal scaling a natural consequence of its data model, not an afterthought bolted on through distributed consensus protocols or shared-nothing clustering. The key insight is straightforward: if no two gateway instances ever need to write to the same database, then scaling out is simply a matter of routing tenants to instances.Per-Tenant Data Isolation
Every tenant in Diminuendo receives its own SQLite database file for session metadata, and every session receives its own dedicated database for conversation history, events, and usage records:acme’s registry physically cannot touch tenant globex’s data — they reside in different files on different filesystem paths. There is no WHERE tenant_id = ? clause to forget, no row-level security policy to misconfigure, no cross-tenant join to accidentally permit.
This isolation extends to deletion semantics. Removing a session means deleting a directory. Removing a tenant means deleting a directory tree. No cascading deletes, no orphaned foreign key references, no vacuum passes over a shared tablespace.
Why This Enables Horizontal Scaling
Since there is no shared state between tenants, multiple Diminuendo instances can serve different tenants independently. A load balancer can route by tenant ID — using sticky sessions or tenant-affinity routing — to ensure all requests for a given tenant reach the same instance. The fundamental invariant is simple: at any point in time, exactly one gateway instance is responsible for a given tenant’s data. This is trivially satisfied by a load balancer that hashes on the tenant ID extracted from the JWT’stenant_id claim.
data/ directory to the new instance and update the routing table.
Sticky Session Requirement
WebSocket connections are inherently stateful. Each connected client maintains in-memory state on the gateway instance: theActiveSession record, the ConnectionState tracking authentication and subscriptions, and the event streaming fiber that consumes Podium events and publishes them to session topics.
A client must reconnect to the same instance that holds its session’s in-memory state. If a load balancer routes a reconnecting client to a different instance, that instance will not have the session’s active Podium connection, event fiber, or subscriber registrations.
During rolling deployments, tenants can be redistributed across instances by leveraging the stale session recovery mechanism: when an instance restarts, it queries all non-idle sessions across its known tenants and resets them to inactive. Clients reconnect, receive a state_snapshot reflecting the reset state, and the session activates cleanly on the new instance.
SQLite as Scaling Advantage
The choice of SQLite over PostgreSQL is often perceived as a scalability limitation. In Diminuendo’s architecture, it is precisely the opposite — SQLite enables a scaling model that a shared database would complicate:No Cluster to Manage
There is no PostgreSQL primary, no read replicas, no connection pooler (PgBouncer/pgcat), no failover orchestrator. Each instance manages its own local files.
Copy-Based Backup
Backing up a tenant means copying a directory. Restoring means placing files. No
pg_dump, no WAL archiving, no point-in-time recovery infrastructure.Per-Session Archival
Completed sessions can be archived independently — compress the session directory, upload to object storage, and delete locally. No
DELETE FROM events WHERE session_id = ? on a multi-terabyte table.WAL Concurrency
WAL mode allows concurrent reads without blocking the writer. The two-worker architecture places reads and writes on separate threads, so a long-running history query never stalls event persistence.
Resource Budget Per Instance
Each Diminuendo instance enforces bounded resource consumption through carefully sized caches and rate limiters:| Resource | Bound | Eviction Policy |
|---|---|---|
| Writer DB cache | 128 max open handles | LRU eviction |
| Reader DB cache | 64 max open handles | LRU eviction |
| Auth rate limiter | 10,000 IP entries | Periodic cleanup (60s interval) |
| Per-connection dedup buffer | 5,000 events | Per-connection, cleared on disconnect |
| Prepared statement cache | WeakMap per DB handle | GC’d when DB handle is evicted |
| Per-connection rate limit | 60 messages per 10s window | Sliding window, per connection |
Vertical Scaling Limits
Diminuendo runs on Bun’s single-threaded JavaScript event loop, with SQLite I/O offloaded to dedicated Web Workers. The practical bottlenecks for a single instance are:-
CPU for JSON serialization — every WebSocket message is
JSON.parse’d on receipt andJSON.stringify’d on send. For high-throughput sessions with rapidtext_deltaevents, this is the dominant CPU cost. - SQLite write throughput — the writer worker batches commands (50ms or 100 commands, whichever comes first) and executes them within transactions. This sustains thousands of writes per second, but a single writer is ultimately serialized.
-
WebSocket connection count — Bun’s event loop can handle thousands of concurrent WebSocket connections, but each connection consumes a file descriptor and a small amount of memory for its
WsDatastate.
What Would Require Redis or PostgreSQL
The current architecture is designed for tenant-affinity routing, where each tenant is served by exactly one instance. Several capabilities would require shared infrastructure:Cross-instance event fan-out
Cross-instance event fan-out
If a client connects to instance A but the session’s Podium events arrive on instance B (because the Podium connection was established there), instance A has no way to receive those events. A shared pub/sub layer (Redis Streams, NATS) would be needed to bridge events across instances.
Cross-instance session handoff
Cross-instance session handoff
Moving an active session from one instance to another — for example, during a rolling deployment — currently requires the session to be deactivated and reactivated. A shared state store would enable live handoff without interrupting the Podium connection.
Global rate limiting
Global rate limiting
The auth rate limiter tracks attempts per IP address within a single instance. A coordinated attacker distributing attempts across instances would bypass per-instance limits. A shared rate limiter (Redis-backed sliding window) would provide global protection.
Shared billing ledger
Shared billing ledger
Layer interfaces, not rewrites of the core logic.
Stale Recovery on Restart
When a gateway instance restarts (whether due to deployment, crash, or scaling event), it performs stale session recovery as part of its startup sequence:1
Enumerate known tenants
The instance queries all known tenant IDs from the
data/tenants/ directory, plus the default tenant (dev in dev mode, default otherwise).2
Query non-idle sessions
For each tenant, the instance queries the registry database for sessions whose status is not
inactive — these are sessions that were active when the previous instance process died.3
Reset to inactive
Each stale session is reset to
inactive. This is safe because Podium connections do not survive process death — the WebSocket to the Podium coordinator was severed when the process exited, and the compute instance has already been reclaimed or timed out.4
Resume normal operation
When clients reconnect and join these sessions, they receive a
state_snapshot showing inactive status. The client can then trigger re-activation, which creates a fresh Podium instance and establishes a new connection.