Testing Methodology

The Diminuendo test suites do not exist to demonstrate that the code works. They exist to demonstrate that aggressive, systematic attempts to break the code have failed. This distinction — between confirmation and refutation — is the foundation of the testing methodology.

Philosophy: Severe Testing

The testing approach draws from Karl Popper’s philosophy of science: a claim is only as credible as the severity of the tests it has survived. A test is severe when it has a high probability of detecting a defect if one exists. A test that passes trivially — because it checks a tautology, because it uses the same code path as the implementation, because it only exercises the happy path — provides no evidence of correctness. In practice, this means:

Tests are Adversarial

Every test is written from the perspective of an attacker trying to break the system. What input would cause a crash? What timing would cause a race condition? What encoding would corrupt deserialization?

Claims are Falsifiable

Each test asserts a specific, falsifiable claim. “The client handles text_delta events” is not falsifiable. “A text_delta event with a 1MB text field is delivered to the handler without truncation” is falsifiable — and tested.

No Confirmation Bias

Tests do not confirm expected behavior by feeding expected inputs and checking expected outputs. They probe boundaries: empty strings, null fields, maximum-size payloads, negative numbers, concurrent mutations.

Mock Oracles, Not Mock Implementations

Each SDK test suite runs against a mock gateway that speaks the actual wire protocol. The mock is an oracle — it knows the correct behavior and validates the client’s conformance — not a stub that returns canned responses.

Test Architecture

Each SDK has a distinct test architecture, adapted to its language’s testing ecosystem, but all share the same structural pattern: a mock gateway server, full-protocol coverage, and adversarial edge cases.

TypeScript: Mock Gateway with Bun.serve

The TypeScript test suite runs a real Bun.serve WebSocket server that implements the Diminuendo wire protocol. The mock gateway:
  • Sends welcome, connected, and authenticated on connection
  • Responds to all 21 client message types with correctly-typed server events
  • Validates inbound message structure
  • Simulates streaming event sequences (multiple text_delta followed by turn_complete)
  • Supports concurrent connections
+----------------+          WebSocket           +------------------+
|   Test Suite   | <=========================> |   Mock Gateway    |
|  (Bun Test)    |     (localhost:random)       |  (Bun.serve WS)  |
|                |                              |                  |
| client.connect |  -->  welcome + auth         |  Speaks full     |
| client.method  |  -->  response event         |  protocol v1     |
| on("event")    |  <--  broadcast events       |                  |
+----------------+                              +------------------+
Test categories:
  • Connection lifecycle: connect, authenticate, reconnect, disconnect handling
  • Session CRUD: create, list, rename, archive, unarchive, delete — each verified against the mock’s response
  • Turn execution: runTurn fires events, stopTurn stops them, steer injects mid-turn
  • File access: list, read, history, iteration retrieval
  • Member management: list, set_role, remove with permission checks
  • Event handler typing: every event type dispatched through the handler system, verified with type-narrowed assertions
  • Wire format: raw JSON assertions for camelCase field names, optional field omission, null handling

Rust: Serde Round-Trip and Adversarial JSON

The Rust test suite relies on serde’s compile-time guarantees plus runtime JSON round-trip verification. Every variant of ClientMessage (21) and ServerEvent (51) is tested for exact wire format conformance.
// Exact JSON assertion — not just "it deserializes" but "it produces THIS exact JSON"
#[test]
fn test_serialize_join_session_with_after_seq() {
    let msg = ClientMessage::JoinSession {
        session_id: "sess-1".into(),
        after_seq: Some(42),
    };
    let v = serde_json::to_value(&msg).unwrap();
    assert_eq!(v["type"], "join_session");
    assert_eq!(v["sessionId"], "sess-1");   // camelCase on wire
    assert_eq!(v["afterSeq"], 42);          // camelCase on wire
}
Test categories:
  • ClientMessage serialization (21 tests): Every variant serialized to JSON, every field name verified as camelCase, every optional field verified to be absent when None
  • ServerEvent deserialization (51 tests): JSON payloads for every event type parsed into the correct Rust enum variant, every field value extracted and asserted
  • Round-trip (selected types): Serialize a ClientMessage, parse the JSON, verify field-by-field
  • Unknown event handling: Unrecognized type values deserialize to ServerEvent::Unknown instead of panicking
  • Adversarial JSON: Malformed JSON, wrong field types, missing required fields, extra fields, negative numbers, i64 overflow, empty strings, Unicode edge cases (emoji, CJK, RTL, combining characters)

Python: Mock websockets.serve Gateway

The Python test suite uses websockets.serve to run a mock gateway in the same event loop as the tests. The mock implements the protocol handshake and request-response cycle.
async def mock_gateway(websocket):
    # Send welcome + connected + authenticated
    await websocket.send(json.dumps({
        "type": "welcome", "protocolVersion": 1, "requiresAuth": True
    }))
    # Handle client messages
    async for raw in websocket:
        msg = json.loads(raw)
        if msg["type"] == "authenticate":
            await websocket.send(json.dumps({
                "type": "authenticated",
                "identity": {"userId": "u1", "email": "test@test.com", "tenantId": "t1"}
            }))
        elif msg["type"] == "list_sessions":
            await websocket.send(json.dumps({
                "type": "session_list", "sessions": [...]
            }))
        # ... all 21 message types handled
Test categories:
  • Connection lifecycle: connect, auto-authenticate, disconnect, reconnect semantics
  • Session methods: every method tested against mock responses, return types verified
  • Dataclass parsing: SessionMeta.from_dict, FileEntry.from_dict, IterationMeta.from_dict, FileContent.from_dict, StateSnapshot.from_dict — all tested with realistic wire payloads
  • ServerEvent properties: session_id, turn_id, text, final_text, error_code, error_message, seq, ts, tool properties, question properties
  • Category predicates: is_turn_event, is_tool_event, is_thinking_event, is_terminal_event, is_sandbox_event, is_interactive_event — each tested for correct inclusion and exclusion
  • Event handler system: registration, unsubscribe, wildcard handler, multiple handlers per event type

Swift: Codable Round-Trip and AsyncStream Delivery

The Swift test suite uses Swift Testing (@Test, #expect) with Codable round-trip verification. All 51 ServerEvent and 21 ClientMessage variants are tested for exact wire format conformance via CodingKeys. Test categories:
  • ClientMessage serialization (21 tests): Every variant encoded to JSON, camelCase field names verified, optional fields omitted when nil
  • ServerEvent deserialization (51 tests): JSON payloads for every event type decoded into the correct enum case, field values extracted and asserted
  • Round-trip (selected types): Encode a ClientMessage, decode the JSON, verify field-by-field
  • Unknown event handling: Unrecognized type values decode to .unknown instead of throwing
  • Adversarial JSON: Missing fields, wrong types, 100KB strings, Unicode edge cases, deeply nested objects, Int64.max overflow, AnyCodable coverage

What Makes Tests Severe

The test suites go well beyond “call method, check result.” Here are the specific adversarial patterns tested across all four SDKs:
The server (or a man-in-the-middle) sends garbage. The client must not crash, must not corrupt its internal state, and should skip the bad message gracefully.
  • Invalid JSON: "{not json", "", "null", raw binary bytes
  • Wrong field types: {"type": "text_delta", "seq": "not-a-number"}, {"type": "welcome", "protocolVersion": null}
  • Missing required fields: {"type": "turn_started"} (no sessionId, turnId, seq, ts)
  • Extra unknown fields: {"type": "pong", "clientTs": 1, "serverTs": 2, "bonus": true} — must not fail
The server never responds. Pending promises/futures must reject with an actionable error, not hang indefinitely.
  • TypeScript: request() wraps every pending promise in a 10-second setTimeout
  • Python: asyncio.wait_for(future, timeout=10.0) with TimeoutError propagation
  • Rust: The application controls timeout semantics (no built-in timeout in the SDK, by design — Tauri apps manage their own lifecycle)
Events may carry unexpectedly large data — a text_delta with a 1MB string, a file_content with a base64-encoded binary, a tool_call with deeply nested JSON args.
  • 1 MB text field in text_delta — must handle without panic or OOM
  • Deeply nested JSON in tool_call.args — serde must not stack-overflow
  • Large sessions array in session_list (thousands of entries) — must parse correctly
Text fields may contain any valid Unicode, including sequences that commonly break naive string handling:
  • Emoji: \u{1F680} (multi-byte UTF-8)
  • CJK: \u4e16\u754c (Chinese characters)
  • RTL: Arabic and Hebrew text
  • Combining characters: e\u0301 (e + combining acute accent)
  • Zero-width characters: \u200b, \u200d
  • Surrogate pairs and supplementary plane characters
All must round-trip through serialize/deserialize without corruption.
Sequence numbers, timestamps, and token counts are integers that may reach extreme values:
  • seq: 0, seq: -1, seq: 9007199254740991 (JS MAX_SAFE_INTEGER)
  • ts: 0, ts: -1
  • i64::MAX and i64::MIN in Rust (serde must not panic)
  • costMicroDollars: null (nullable fields must be handled)
  • inputTokens: null, outputTokens: null (SDK-specific null handling)
The protocol is designed for forward compatibility. When the gateway adds new event types, existing clients must not crash.
  • Rust: The ServerEvent enum includes #[serde(other)] Unknown — any unrecognized type value deserializes to ServerEvent::Unknown instead of returning a deserialization error
  • TypeScript: The wildcard handler client.on("*", handler) receives all events regardless of type. Unrecognized types are dispatched through the handler system normally — they just have no type-specific handler registered
  • Python: The ServerEvent dataclass wraps any event — the type field is a plain string, so unknown types are represented naturally. The wildcard client.on("*", handler) catches them
  • Swift: The ServerEvent enum includes a .unknown(type:data:) case — any unrecognized type value decodes cleanly instead of throwing a DecodingError
Multiple requests in flight simultaneously must resolve to their correct responses, not cross-contaminate.
  • TypeScript: Request chaining by type:sessionId key ensures sequential resolution for same-target operations, while different targets resolve independently
  • Python: Multiple pending futures with predicate-based matching — each response is matched to the correct request by its predicate, not by arrival order
  • Rust: The channel-based architecture naturally handles concurrency — events arrive in order on the channel, and the application’s event loop dispatches them

Anti-Patterns Avoided

The test suites deliberately avoid common testing anti-patterns that inflate test counts without providing evidence of correctness:
Anti-PatternWhy It’s Avoided
Testing defaultsAsserting that ClientOptions has default values tells you nothing about whether the client works. Defaults are implementation details, not observable behavior.
Constructor-existence testsexpect(new DiminuendoClient(opts)).toBeDefined() provides zero evidence that the client can connect, authenticate, or handle events.
Type-only checksexpect(typeof event.type).toBe("string") passes for any string, including wrong ones. Assert the specific expected value.
Trivial round-tripsSerializing {a: 1} and checking it deserializes to {a: 1} only tests the JSON library, not the application. Tests use realistic wire payloads.
Snapshot testsSnapshot-based assertions are fragile (break on formatting changes) and opaque (reviewers cannot see what’s being asserted). All assertions are explicit.
Mocking the thing under testTests never mock the SDK client itself. They mock the server and test the client’s behavior against that mock.

Coverage Statistics

SDKTestsAssertionsMessage TypesEvent TypesError Types
TypeScript137~60021/2151/51All
Rust178~80021/2151/51All
Python173~80021/2151/51All
Swift193~90021/2151/51All
Total681~3,10021/2151/51All
The gateway itself has an additional 9 integration tests that validate the server-side protocol handling, session lifecycle, event mapping, and error paths against an in-process gateway instance with SQLite in-memory databases and mock Podium connections.

Running the Tests

1

Gateway (9 tests)

bun test
Runs against an in-process gateway with no external dependencies. Uses in-memory SQLite.
2

TypeScript SDK (137 tests)

cd sdk/typescript && bun test
Starts a mock Bun.serve WebSocket gateway, runs all tests, and tears down the server.
3

Rust SDK (178 tests)

cd sdk/rust && cargo test
All tests are unit tests — no network I/O required. Serde round-trip tests run in-process.
4

Python SDK (173 tests)

cd sdk/python && python -m pytest tests/ -v
Starts a mock websockets.serve gateway in the asyncio event loop alongside the tests.
5

Swift SDK (193 tests)

cd sdk/swift && swift test
Uses Swift Testing framework with Codable round-trip and adversarial JSON tests.
Run all tests across all SDKs in a single command:
bun test && \
(cd sdk/typescript && bun test) && \
(cd sdk/rust && cargo test) && \
(cd sdk/python && python -m pytest tests/ -v) && \
(cd sdk/swift && swift test)
This is the full test suite that must pass before any code is merged.