Protocol Overview
The Diminuendo wire protocol defines a structured, bidirectional communication layer between frontend clients and the gateway. It is designed for real-time streaming of AI agent events — thinking blocks, tool calls, terminal output, file mutations — while maintaining the strict ordering and persistence guarantees required for reliable session replay.Transport and Encoding
The protocol operates exclusively over WebSocket connections (RFC 6455). Every frame is a UTF-8-encoded JSON object. Binary frames are not used. Compression (perMessageDeflate) is disabled to minimize latency on the hot path — text deltas arrive at sub-millisecond intervals during active turns, and decompression overhead is unacceptable at that frequency.
The protocol version is currently 1. The
welcome message includes a protocolVersion field that clients should validate. A ProtocolVersionMismatch error is raised if the client and gateway disagree on the version.Protocol Version
All messages belong to protocol version1. This version number is transmitted in the initial welcome event and is also available as a constant in every SDK:
| SDK | Constant |
|---|---|
| TypeScript | PROTOCOL_VERSION (= 1) |
| Rust | PROTOCOL_VERSION: u32 (= 1) |
| Python | Implicit in wire format |
Connection Lifecycle
Every WebSocket connection progresses through a deterministic sequence of phases:1
Connect
The client opens a WebSocket connection to
ws(s)://host:port/ws. The gateway validates the Origin header against its allowlist (bypassed in dev mode) and performs a CSRF check for browser-origin connections.2
Welcome
The gateway immediately sends a
welcome event containing the protocol version and whether authentication is required. A connected event follows with the assigned clientId and the heartbeat interval.3
Authenticate
If
requiresAuth is true, the client must send an authenticate message with a valid JWT or API key. The gateway verifies the token (via Auth0 JWKS in production) and responds with authenticated containing the user’s identity. In dev mode, authentication is automatic — the gateway assigns a synthetic identity (developer@example.com) and sends authenticated without requiring a token.4
Session Interaction
After authentication, the client may send any of the 21 message types: listing sessions, creating sessions, joining sessions, running turns, and so on. Messages sent before authentication (except
authenticate itself) are rejected with a NOT_AUTHENTICATED error.5
Join Session
To receive streaming events for a session, the client sends
join_session. The gateway responds with a state_snapshot — a complete picture of the session’s current state — and subscribes the client to all future events for that session.6
Event Stream
While joined, the client receives all events broadcast to the session topic:
text_delta, tool_call, thinking.progress, terminal.stream, and so on. Events carry monotonically increasing sequence numbers for ordering and replay.Message Format
Every message — both client-to-server and server-to-client — is a JSON object with atype field that serves as the discriminator:
type field is always a string. Client messages use 21 distinct types; server events use 51. The gateway’s Effect Schema parser validates every inbound message against the full union of client message schemas and rejects anything that does not match with an INVALID_MESSAGE error.
Sequence Numbers
Events that belong to a session carry aseq field — a per-session, monotonically increasing integer. Sequence numbers serve three purposes:
- Ordering — clients can sort events by
seqto reconstruct the correct order, even if WebSocket frames arrive out of order due to network conditions - Deduplication — replayed events carry the same
seqas the original; clients can skip events they have already processed - Resumption — when reconnecting, clients pass
afterSeqin thejoin_sessionmessage to receive only events they missed
seq: 1.
Timestamps
Events include ats field containing the Unix epoch time in milliseconds at which the gateway generated (or relayed) the event. Timestamps are server-authoritative — clients should not rely on their own wall clock for ordering.
The ping/pong mechanism exposes both clientTs (echoed back from the client’s ping) and serverTs (the gateway’s timestamp at pong time), enabling clients to compute round-trip latency and approximate clock skew.
Event Classification
Server events fall into two persistence categories, which determine whether they survive a gateway restart and are available for replay:Persistent Events
Stored in the session’s SQLite database. Available for replay via
get_events and join_session with afterSeq. Examples: turn_started, turn_complete, tool_call, tool_result, question_requested, session_state, sandbox_ready, sandbox_removed.Ephemeral Events
Broadcast to currently-connected subscribers only. Not stored. Lost if no client is listening. Examples:
text_delta, thinking.progress, terminal.stream, heartbeat, usage.update.turn_started, turn_complete) capture the final, authoritative state.
Heartbeat
The gateway sends aheartbeat event every 30 seconds to all session topics with active subscribers. The heartbeat contains only a ts field:
connected event via the heartbeatIntervalMs field, so clients need not hardcode the value.
Reconnection
The protocol is designed for graceful recovery from disconnections. The reconnection procedure is:1
Detect Disconnection
Either the WebSocket
close event fires, or the heartbeat timeout expires. SDKs with autoReconnect enabled handle this automatically.2
Re-establish Connection
Open a new WebSocket, authenticate, and receive the
welcome / connected / authenticated sequence as normal.3
Rejoin with afterSeq
Send
join_session with the afterSeq field set to the last seq received before disconnection. The gateway replays all persistent events after that sequence number.4
Handle Gap Events
If the gateway detects that some events between the client’s
afterSeq and the current head are missing (e.g., because ephemeral events were not persisted), it sends a gap event indicating the range of missing sequence numbers.5
Replay Complete
After replaying all available events, the gateway sends
replay_complete with the lastSeq value. The client can then resume normal real-time event processing.