Design Decisions
Every architecture is the product of trade-offs — choices made under constraints of time, team size, operational capacity, and product requirements. This page documents the significant decisions in Diminuendo’s architecture, the reasoning behind each, and the trade-offs accepted.Bun over Node.js
Decision: Use Bun as the runtime instead of Node.js. Rationale: Bun provides three capabilities that are load-bearing in Diminuendo’s architecture:- Native SQLite via
bun:sqlite— synchronous, in-process, with prepared statements and WAL mode. No external driver, no native addon compilation, no connection pool. - Native WebSocket server with built-in pub/sub —
Bun.serve()supports topic-based publish/subscribe directly, eliminating the need forws+ an external message broker for single-instance deployments. - Fast startup — the gateway starts in under 200ms, enabling rapid restart cycles and minimal downtime during deployments.
- Smaller ecosystem than Node.js. Some npm packages with native addons may not compile cleanly.
- Fewer production battle-scars. Bun has not been subjected to the same breadth of adversarial workloads as Node.js in large-scale deployments.
- Web Worker semantics differ subtly from Node.js
worker_threads. The multi-worker SQLite architecture required careful attention to Bun-specific behavior.
Effect TS over Raw TypeScript
Decision: Write all business logic using Effect rather than raw TypeScript with Promises. Rationale: Effect provides four properties that are difficult to achieve with conventional TypeScript:- Typed errors — every function declares its failure modes in the type signature.
Effect.Effect<SessionMeta, AuthError | RegistryError>is a type-level contract that the caller must handle both error cases. There are no thrown exceptions. - Resource management — database connections, WebSocket handles, and timers are acquired and released within scoped Effects. Resources are guaranteed to be cleaned up, even on fiber interruption.
- Structured concurrency — background fibers (e.g., the Podium event streaming fiber) are tracked by the runtime. When the server shuts down, all fibers are interrupted and their resources released.
- Dependency injection — services are composed via
Layer, not imported as singletons. Every service declares its dependencies in the type signature, and the Layer graph wires them together at composition time.
- Steep learning curve. Effect’s programming model is unfamiliar to most TypeScript developers.
- Larger cognitive overhead when reading code.
Effect.gen(function* () { ... })is more verbose thanasync/await. - Bundle size increase due to the Effect runtime.
- Hiring: the pool of developers fluent in Effect TS is significantly smaller than the general TypeScript pool.
SQLite over PostgreSQL
Decision: Use SQLite exclusively for all persistence. No PostgreSQL, no Redis, no external database. Rationale: Diminuendo’s data model maps naturally to per-tenant and per-session databases:- Zero-ops deployment — the gateway binary is the entire backend. Start it, and SQLite files are created on demand. No database provisioning, no connection string management, no schema migration tooling beyond what ships with the gateway.
- Per-tenant isolation — each tenant’s registry is a separate file. A bug or corruption in one tenant’s data cannot propagate to another.
- Trivial backup — copy a directory. Restore by placing files. No
pg_dump, no WAL archiving. - No connection pool management — SQLite is in-process. There are no idle connections, no pool exhaustion, no connection timeout tuning.
- No cross-instance queries. If a session’s data resides on instance A, instance B cannot query it without file transfer.
- WAL mode serializes writes within a single database file. The multi-worker architecture mitigates this by batching writes, but a single writer is still the throughput ceiling.
- No built-in replication. High availability requires external backup and restore mechanisms.
- SQL feature surface is narrower than PostgreSQL — no
jsonboperators, no lateral joins, no CTEs withRETURNING.
Multi-Worker SQLite over Single-Thread
Decision: Separate SQLite reads and writes into dedicated Bun Web Workers communicating viapostMessage.
Rationale: On the main thread, Bun’s event loop handles WebSocket I/O, JSON parsing, and Effect fiber scheduling. If SQLite writes were performed synchronously on the main thread, a slow transaction (e.g., flushing 100 event inserts) would block event delivery to all connected clients for the duration of the write.
The two-worker architecture separates these hot paths:
- The writer worker batches incoming commands and flushes on a 50ms timer or at 100 commands, whichever comes first. Writes never block the main thread.
- The reader worker handles
SELECTqueries on a separate thread. WAL mode ensures readers never block the writer.
- Message passing overhead. Every write command and read response crosses a thread boundary via structured clone.
- Complexity of worker lifecycle management. Both workers must be spawned, configured with the data directory, and gracefully shut down with flush guarantees.
- Fire-and-forget writes mean a crash between
postMessageand the writer’s flush could lose buffered commands. Theflush()method provides an explicit synchronization point when needed (e.g., before deleting a session directory).
7-State Machine over Status Strings
Decision: Model session lifecycle as a 7-state finite state machine (inactive, activating, ready, running, waiting, deactivating, error) with an explicit transition guard table.
Rationale: Early prototypes used string status fields ("idle", "running", "error") with ad-hoc updates. This led to invalid states: sessions stuck in "running" after an agent crash, sessions transitioning from "idle" directly to "waiting", sessions that could never recover from "error".
The state machine enforces valid transitions:
applySessionTransition(current, agentStatus) is a pure function that returns null for invalid transitions, making it trivial to test and reason about. The stale session recovery mechanism relies on the error -> inactive transition to reset sessions after a crash.
Trade-offs:
- More complex state management code. Every state change must go through the transition function.
- Must handle rejected transitions gracefully — if
applySessionTransitionreturnsnull, the handler must log the rejected transition and decide whether to force-reset. - Legacy status values from older protocol versions must be mapped via
migrateLegacyStatus().
Handler Decomposition over Monolithic Router
Decision: Decompose the message router into focused handler modules (interactive.ts, terminal.ts, thinking.ts, tool-lifecycle.ts, sandbox.ts, message-complete.ts) rather than a single large router function.
Rationale: The Podium event stream produces 30+ distinct message types, each requiring different downstream actions (state transitions, event persistence, billing settlement, subscriber notification). A single function handling all cases would exceed 500 lines and conflate unrelated concerns.
Each handler module is responsible for:
- Matching on its relevant event types
- Performing the appropriate state transition
- Persisting events to SQLite
- Returning the client-facing events to broadcast
- More files to navigate. Understanding the full event processing pipeline requires reading across 6+ handler modules.
- Context object threading. Each handler receives a context object with the session’s state, the broadcaster, the worker manager, and the billing service. This context must be threaded through every handler invocation.
Per-Session Sequence Numbers over Global
Decision: Assign sequence numbers per session rather than globally across all sessions. Rationale: Clients use sequence numbers for two purposes: ordering events within a session, and resuming from a known position after reconnection (afterSeq parameter on join_session). Per-session counters ensure that replaying events for session A requires no knowledge of session B’s event stream.
- Sequence counters are currently maintained in-memory and reset to 0 on process restart. On restart, new events will have sequence numbers that collide with previously persisted events. Clients that reconnect with
afterSeqfrom a previous session will receive duplicate events. This is a known gap — the counter should be seeded from the maximumseqin the session’seventstable on first access.
Gateway Adapter Pattern for Clients
Decision: Define aGatewayAdapter interface using Effect types (Effect.Effect for commands, Stream.Stream for events) that both web and desktop clients implement.
Rationale: The web client connects directly to the gateway via WebSocket (TypeScript SDK). The desktop client connects via Tauri IPC to a Rust backend that holds the WebSocket connection (Rust SDK). The React UI code — components, hooks, stores — should be identical in both cases.
The GatewayAdapter interface provides a uniform API:
connect(url, token)returnsEffect.Effect<void, GatewayError>eventsis aStream.Stream<GatewayEvent, GatewayError>- All 21 protocol methods are declared with Effect return types
WebGatewayAdapter wraps the TypeScript SDK’s Promise-based methods in Effect.tryPromise. The TauriGatewayAdapter wraps @tauri-apps/api/core::invoke and @tauri-apps/api/event::listen.
Trade-offs:
- Abstraction layer adds complexity. Both adapters must be kept in sync with protocol changes.
- The Effect dependency is pulled into the client bundle, increasing its size.
- Testing requires mocking the adapter interface, which is more complex than mocking a raw WebSocket.
Zustand over Redux / MobX
Decision: Use Zustand for client-side state management. Rationale: Zustand provides a minimal API with direct state updates, no action dispatchers, no reducers, and no boilerplate. State slices (connection, sessions, chat, preferences) are defined as simple objects with update functions. Zustand integrates naturally with Effect streams — the event stream from theGatewayAdapter feeds directly into store update functions.
Trade-offs:
- Less structure than Redux for large teams. No enforced action/reducer pattern, no middleware pipeline.
- No saga debugger or time-travel debugging (Redux DevTools).
- Store boundaries must be designed carefully — Zustand does not enforce separation of concerns the way Redux slices do.
Fire-and-Forget Writes with Flush
Decision: TheWorkerManager.write() method is synchronous and void-returning. The main thread posts the command via postMessage and moves on without waiting for acknowledgement. An explicit flush(sessionId) method blocks until all pending writes for that session are committed.
Rationale: Event persistence is on the critical path of agent turn processing. Every text_delta, tool_call, and usage_update event needs to be persisted, but the client should not wait for the database write before receiving the event over WebSocket. Fire-and-forget writes decouple event delivery latency from storage latency.
The flush() method provides a synchronization point for operations that require write durability — specifically, before deleting a session’s directory from disk.
Trade-offs:
- Writes can be lost if the process crashes between
postMessageand the writer’s next flush cycle (up to 50ms of data). For a gateway serving AI agent sessions, this is an acceptable risk — the Podium coordinator retains the authoritative event log, and a reconnecting client can replay from the upstream source. - No per-write error propagation. If a write fails (e.g., disk full), the writer worker logs the error but the main thread is not notified.