Protocol Overview

The Diminuendo wire protocol defines a structured, bidirectional communication layer between frontend clients and the gateway. It is designed for real-time streaming of AI agent events — thinking blocks, tool calls, terminal output, file mutations — while maintaining the strict ordering and persistence guarantees required for reliable session replay.

Transport and Encoding

The protocol operates exclusively over WebSocket connections (RFC 6455). Every frame is a UTF-8-encoded JSON object. Binary frames are not used. Compression (perMessageDeflate) is disabled to minimize latency on the hot path — text deltas arrive at sub-millisecond intervals during active turns, and decompression overhead is unacceptable at that frequency.
Client                          Gateway
  |                                |
  |  ── WebSocket upgrade ───────> |  GET /ws → 101 Switching Protocols
  |  <──── welcome ────────────── |  {"type":"welcome","protocolVersion":1,"requiresAuth":true}
  |  <──── connected ─────────── |  {"type":"connected","clientId":"...","heartbeatIntervalMs":30000,"ts":...}
  |  ── authenticate ────────────> |  {"type":"authenticate","token":"ey..."}
  |  <──── authenticated ──────── |  {"type":"authenticated","identity":{...}}
  |                                |
  |  (connection is now live)      |
The protocol version is currently 1. The welcome message includes a protocolVersion field that clients should validate. A ProtocolVersionMismatch error is raised if the client and gateway disagree on the version.

Protocol Version

All messages belong to protocol version 1. This version number is transmitted in the initial welcome event and is also available as a constant in every SDK:
SDKConstant
TypeScriptPROTOCOL_VERSION (= 1)
RustPROTOCOL_VERSION: u32 (= 1)
PythonImplicit in wire format
The version will be incremented only when breaking changes to the wire format are introduced. Additive changes — new event types, new optional fields — do not increment the version.

Connection Lifecycle

Every WebSocket connection progresses through a deterministic sequence of phases:
1

Connect

The client opens a WebSocket connection to ws(s)://host:port/ws. The gateway validates the Origin header against its allowlist (bypassed in dev mode) and performs a CSRF check for browser-origin connections.
2

Welcome

The gateway immediately sends a welcome event containing the protocol version and whether authentication is required. A connected event follows with the assigned clientId and the heartbeat interval.
3

Authenticate

If requiresAuth is true, the client must send an authenticate message with a valid JWT or API key. The gateway verifies the token (via Auth0 JWKS in production) and responds with authenticated containing the user’s identity. In dev mode, authentication is automatic — the gateway assigns a synthetic identity (developer@example.com) and sends authenticated without requiring a token.
4

Session Interaction

After authentication, the client may send any of the 21 message types: listing sessions, creating sessions, joining sessions, running turns, and so on. Messages sent before authentication (except authenticate itself) are rejected with a NOT_AUTHENTICATED error.
5

Join Session

To receive streaming events for a session, the client sends join_session. The gateway responds with a state_snapshot — a complete picture of the session’s current state — and subscribes the client to all future events for that session.
6

Event Stream

While joined, the client receives all events broadcast to the session topic: text_delta, tool_call, thinking.progress, terminal.stream, and so on. Events carry monotonically increasing sequence numbers for ordering and replay.

Message Format

Every message — both client-to-server and server-to-client — is a JSON object with a type field that serves as the discriminator:
{
  "type": "text_delta",
  "sessionId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "turnId": "turn-001",
  "text": "Let me analyze ",
  "seq": 42,
  "ts": 1709312400000
}
The type field is always a string. Client messages use 21 distinct types; server events use 51. The gateway’s Effect Schema parser validates every inbound message against the full union of client message schemas and rejects anything that does not match with an INVALID_MESSAGE error.
All field names use camelCase on the wire, regardless of the SDK’s native naming convention. The Rust SDK maps snake_case struct fields to camelCase via #[serde(rename)], and the Python SDK converts in its from_dict class methods.

Sequence Numbers

Events that belong to a session carry a seq field — a per-session, monotonically increasing integer. Sequence numbers serve three purposes:
  1. Ordering — clients can sort events by seq to reconstruct the correct order, even if WebSocket frames arrive out of order due to network conditions
  2. Deduplication — replayed events carry the same seq as the original; clients can skip events they have already processed
  3. Resumption — when reconnecting, clients pass afterSeq in the join_session message to receive only events they missed
Sequence numbers are scoped to a session. Different sessions have independent counters. The first event in a session has seq: 1.
Not all events carry a seq field. Connection-level events (welcome, authenticated, connected, heartbeat, pong, error) are not session-scoped and therefore have no sequence number. Session-level metadata events (session_list, session_created, session_state) also lack sequence numbers — they are not part of the replayable event stream.

Timestamps

Events include a ts field containing the Unix epoch time in milliseconds at which the gateway generated (or relayed) the event. Timestamps are server-authoritative — clients should not rely on their own wall clock for ordering. The ping/pong mechanism exposes both clientTs (echoed back from the client’s ping) and serverTs (the gateway’s timestamp at pong time), enabling clients to compute round-trip latency and approximate clock skew.

Event Classification

Server events fall into two persistence categories, which determine whether they survive a gateway restart and are available for replay:

Persistent Events

Stored in the session’s SQLite database. Available for replay via get_events and join_session with afterSeq. Examples: turn_started, turn_complete, tool_call, tool_result, question_requested, session_state, sandbox_ready, sandbox_removed.

Ephemeral Events

Broadcast to currently-connected subscribers only. Not stored. Lost if no client is listening. Examples: text_delta, thinking.progress, terminal.stream, heartbeat, usage.update.
The distinction is deliberate. Ephemeral events represent intermediate streaming state — individual text tokens, thinking fragments, terminal bytes — that would be prohibitively expensive to persist at the rate they are produced (hundreds per second during active generation). The persistent events that bookend them (turn_started, turn_complete) capture the final, authoritative state.
When building a client that supports reconnection, persist the last received seq locally. On reconnect, pass it as afterSeq when re-joining the session. The gateway will replay all persistent events after that sequence number, followed by a replay_complete event. Ephemeral events (text deltas, thinking progress) from before the reconnection are permanently lost — the state_snapshot delivered on join provides the accumulated textSoFar to compensate.

Heartbeat

The gateway sends a heartbeat event every 30 seconds to all session topics with active subscribers. The heartbeat contains only a ts field:
{ "type": "heartbeat", "ts": 1709312430000 }
Clients should treat a silence of more than 35 seconds (heartbeat interval plus a 5-second tolerance) as evidence of a stale connection and initiate reconnection. The heartbeat interval is also communicated in the initial connected event via the heartbeatIntervalMs field, so clients need not hardcode the value.

Reconnection

The protocol is designed for graceful recovery from disconnections. The reconnection procedure is:
1

Detect Disconnection

Either the WebSocket close event fires, or the heartbeat timeout expires. SDKs with autoReconnect enabled handle this automatically.
2

Re-establish Connection

Open a new WebSocket, authenticate, and receive the welcome / connected / authenticated sequence as normal.
3

Rejoin with afterSeq

Send join_session with the afterSeq field set to the last seq received before disconnection. The gateway replays all persistent events after that sequence number.
4

Handle Gap Events

If the gateway detects that some events between the client’s afterSeq and the current head are missing (e.g., because ephemeral events were not persisted), it sends a gap event indicating the range of missing sequence numbers.
5

Replay Complete

After replaying all available events, the gateway sends replay_complete with the lastSeq value. The client can then resume normal real-time event processing.

Rate Limiting

The gateway enforces a per-connection rate limit of 60 messages per 10-second window using a sliding window counter. Messages that exceed this limit receive an immediate error response:
{ "type": "error", "code": "RATE_LIMITED", "message": "Too many messages -- slow down" }
Authentication attempts are rate-limited separately via a dedicated auth rate limiter that tracks attempts by IP address. Exceeding the auth rate limit returns:
{ "type": "error", "code": "AUTH_RATE_LIMITED", "message": "Too many auth attempts. Retry after 30s" }
Individual messages are also size-limited to 1 MB. Messages exceeding this threshold are rejected before parsing:
{ "type": "error", "code": "MESSAGE_TOO_LARGE", "message": "Message exceeds maximum allowed size (1MB)" }

What’s Next