Diminuendo
Diminuendo is the client-facing gateway of the iGentAI platform — the sole entry point through which all frontend clients communicate with the AI agent infrastructure. Every WebSocket connection from a web browser, desktop application, or CLI tool terminates at Diminuendo. No client ever speaks directly to an agent orchestrator, an LLM inference service, or a workspace filesystem. Diminuendo is the narrow waist of the entire system.The Name
In musical notation, diminuendo (or decrescendo) denotes a gradual decrease in volume. The gateway serves an analogous function in the iGentAI architecture: it reduces the complexity of a distributed system — agent orchestration, LLM inference, workspace filesystems, billing, RBAC, session persistence — into a single, coherent WebSocket protocol that any client can consume with a few dozen lines of code.The Problem
AI coding agents are fundamentally different from traditional request-response APIs. A single user interaction — “refactor this module to use dependency injection” — may produce dozens of real-time events over several minutes: thinking blocks as the model reasons, tool calls as it reads and writes files, terminal output as it runs tests, permission requests when it encounters sensitive operations, and finally a structured completion with token usage. All of this must stream to the client with sub-100ms latency for interactive use. Beyond the wire protocol, a production gateway must solve several orthogonal problems simultaneously:- Multi-tenant isolation — each organization’s sessions, history, and billing must be strictly separated, with no possibility of cross-tenant data leakage
- Session persistence — users expect to close their laptop, reopen it tomorrow, and resume exactly where they left off, with full conversation history and event replay
- Billing enforcement — credit reservations must be checked before an expensive LLM turn begins, not after
- RBAC — owners, admins, and members have different permissions over sessions, team management, and billing
- Connection resilience — clients disconnect, servers restart, agents crash; the system must recover gracefully from all of these without losing data
- Horizontal scalability — the gateway must scale to thousands of concurrent sessions without requiring shared state between instances
The Solution
Diminuendo is a lean, single-binary gateway — approximately 8,000 lines of Effect TS — running on Bun. It uses SQLite in WAL mode for zero-ops persistence: no Postgres cluster, no Redis dependency, no external state store. Each tenant gets its own SQLite database for session metadata; each session gets its own database for conversation history and events. This per-tenant data isolation means horizontal scaling requires nothing more than routing tenants to different gateway instances. It also includes an automation system for scheduled runs, heartbeats, and background triage, using the same event pipeline as interactive turns.The entire gateway compiles to a single binary with
bun build. There is no Docker image to pull, no Kubernetes manifest to deploy, no database migration to run at startup (migrations are applied automatically). Clone, install, run.Position in the Platform
Diminuendo sits between frontend clients and three backend services, each responsible for a distinct domain:| Service | Responsibility |
|---|---|
| Podium | Agent orchestration — creates agent instances, manages their lifecycle, routes messages to running agents, and streams their output back |
| Ensemble | LLM inference — model selection, token accounting, provider failover, cost estimation |
| Chronicle | Workspace filesystem — content-addressed storage, file versioning, bidirectional sync with local filesystems |
Protocol at a Glance
The Diminuendo wire protocol is a typed, JSON-over-WebSocket protocol with 21 client message types and 51 server event types.21 Client Messages
Session CRUD, turn execution, file access, team management, and connection lifecycle
51 Server Events
Streaming text, tool calls, thinking blocks, terminal output, sandbox lifecycle, billing updates, and more
7-State Machine
Each session transitions through: inactive, activating, ready, running, waiting, deactivating, error
Automation System
Scheduled automations, heartbeats, and inbox-driven background runs
SDKs
Four official client SDKs are available, each providing typed wrappers around the wire protocol:TypeScript
Zero-dependency, works in browsers and Node.js/Bun. Promise-based with typed event handlers.
Rust
Tokio-based async client with
Stream for events. Designed for Tauri desktop apps and CLI tools.Python
Asyncio-based client with callback event handling. Suitable for scripting, testing, and notebook integration.
Swift
Actor-based async/await client for macOS and iOS. Codable events with zero third-party dependencies.
What’s Next
1
Quickstart
Get the gateway running locally in under a minute. Go to Quickstart
2
Architecture
Understand how the gateway’s modules compose — from transport to storage to upstream services. Go to Architecture
3
Wire Protocol
Explore every client message and server event in detail. Go to Protocol