Testing Methodology

The Diminuendo test suites do not exist to demonstrate that the code works. They exist to demonstrate that aggressive, systematic attempts to break the code have failed. This distinction — between confirmation and refutation — is the foundation of the testing methodology.

Philosophy: Severe Testing

The testing approach draws from Karl Popper’s philosophy of science: a claim is only as credible as the severity of the tests it has survived. A test is severe when it has a high probability of detecting a defect if one exists. A test that passes trivially — because it checks a tautology, because it uses the same code path as the implementation, because it only exercises the happy path — provides no evidence of correctness. In practice, this means:

Tests are Adversarial

Every test is written from the perspective of an attacker trying to break the system. What input would cause a crash? What timing would cause a race condition? What encoding would corrupt deserialization?

Claims are Falsifiable

Each test asserts a specific, falsifiable claim. “The client handles text_delta events” is not falsifiable. “A text_delta event with a 1MB text field is delivered to the handler without truncation” is falsifiable — and tested.

No Confirmation Bias

Tests do not confirm expected behavior by feeding expected inputs and checking expected outputs. They probe boundaries: empty strings, null fields, maximum-size payloads, negative numbers, concurrent mutations.

Mock Oracles, Not Mock Implementations

Each SDK test suite runs against a mock gateway that speaks the actual wire protocol. The mock is an oracle — it knows the correct behavior and validates the client’s conformance — not a stub that returns canned responses.

Test Architecture

Each SDK has a distinct test architecture, adapted to its language’s testing ecosystem, but all share the same structural pattern: a mock gateway server, full-protocol coverage, and adversarial edge cases.

TypeScript: Mock Gateway with Bun.serve

The TypeScript test suite runs a real Bun.serve WebSocket server that implements the Diminuendo wire protocol. The mock gateway:

Sends welcome, connected, and authenticated on connection
Responds to all 21 client message types with correctly-typed server events
Validates inbound message structure
Simulates streaming event sequences (multiple text_delta followed by turn_complete)
Supports concurrent connections

+----------------+          WebSocket           +------------------+
|   Test Suite   | <=========================> |   Mock Gateway    |
|  (Bun Test)    |     (localhost:random)       |  (Bun.serve WS)  |
|                |                              |                  |
| client.connect |  -->  welcome + auth         |  Speaks full     |
| client.method  |  -->  response event         |  protocol v1     |
| on("event")    |  <--  broadcast events       |                  |
+----------------+                              +------------------+

Test categories:

Connection lifecycle: connect, authenticate, reconnect, disconnect handling
Session CRUD: create, list, rename, archive, unarchive, delete — each verified against the mock’s response
Turn execution: runTurn fires events, stopTurn stops them, steer injects mid-turn
File access: list, read, history, iteration retrieval
Member management: list, set_role, remove with permission checks
Event handler typing: every event type dispatched through the handler system, verified with type-narrowed assertions
Wire format: raw JSON assertions for camelCase field names, optional field omission, null handling

Rust: Serde Round-Trip and Adversarial JSON

The Rust test suite relies on serde’s compile-time guarantees plus runtime JSON round-trip verification. Every variant of ClientMessage (21) and ServerEvent (51) is tested for exact wire format conformance.

// Exact JSON assertion — not just "it deserializes" but "it produces THIS exact JSON"
#[test]
fn test_serialize_join_session_with_after_seq() {
    let msg = ClientMessage::JoinSession {
        session_id: "sess-1".into(),
        after_seq: Some(42),
    };
    let v = serde_json::to_value(&msg).unwrap();
    assert_eq!(v["type"], "join_session");
    assert_eq!(v["sessionId"], "sess-1");   // camelCase on wire
    assert_eq!(v["afterSeq"], 42);          // camelCase on wire
}

Test categories:

ClientMessage serialization (21 tests): Every variant serialized to JSON, every field name verified as camelCase, every optional field verified to be absent when None
ServerEvent deserialization (51 tests): JSON payloads for every event type parsed into the correct Rust enum variant, every field value extracted and asserted
Round-trip (selected types): Serialize a ClientMessage, parse the JSON, verify field-by-field
Unknown event handling: Unrecognized type values deserialize to ServerEvent::Unknown instead of panicking
Adversarial JSON: Malformed JSON, wrong field types, missing required fields, extra fields, negative numbers, i64 overflow, empty strings, Unicode edge cases (emoji, CJK, RTL, combining characters)

Python: Mock websockets.serve Gateway

The Python test suite uses websockets.serve to run a mock gateway in the same event loop as the tests. The mock implements the protocol handshake and request-response cycle.

async def mock_gateway(websocket):
    # Send welcome + connected + authenticated
    await websocket.send(json.dumps({
        "type": "welcome", "protocolVersion": 1, "requiresAuth": True
    }))
    # Handle client messages
    async for raw in websocket:
        msg = json.loads(raw)
        if msg["type"] == "authenticate":
            await websocket.send(json.dumps({
                "type": "authenticated",
                "identity": {"userId": "u1", "email": "test@test.com", "tenantId": "t1"}
            }))
        elif msg["type"] == "list_sessions":
            await websocket.send(json.dumps({
                "type": "session_list", "sessions": [...]
            }))
        # ... all 21 message types handled

Test categories:

Connection lifecycle: connect, auto-authenticate, disconnect, reconnect semantics
Session methods: every method tested against mock responses, return types verified
Dataclass parsing: SessionMeta.from_dict, FileEntry.from_dict, IterationMeta.from_dict, FileContent.from_dict, StateSnapshot.from_dict — all tested with realistic wire payloads
ServerEvent properties: session_id, turn_id, text, final_text, error_code, error_message, seq, ts, tool properties, question properties
Category predicates: is_turn_event, is_tool_event, is_thinking_event, is_terminal_event, is_sandbox_event, is_interactive_event — each tested for correct inclusion and exclusion
Event handler system: registration, unsubscribe, wildcard handler, multiple handlers per event type

Swift: Codable Round-Trip and AsyncStream Delivery

The Swift test suite uses Swift Testing (@Test, #expect) with Codable round-trip verification. All 51 ServerEvent and 21 ClientMessage variants are tested for exact wire format conformance via CodingKeys. Test categories:

ClientMessage serialization (21 tests): Every variant encoded to JSON, camelCase field names verified, optional fields omitted when nil
ServerEvent deserialization (51 tests): JSON payloads for every event type decoded into the correct enum case, field values extracted and asserted
Round-trip (selected types): Encode a ClientMessage, decode the JSON, verify field-by-field
Unknown event handling: Unrecognized type values decode to .unknown instead of throwing
Adversarial JSON: Missing fields, wrong types, 100KB strings, Unicode edge cases, deeply nested objects, Int64.max overflow, AnyCodable coverage

What Makes Tests Severe

The test suites go well beyond “call method, check result.” Here are the specific adversarial patterns tested across all four SDKs:

Malformed Server Data

The server (or a man-in-the-middle) sends garbage. The client must not crash, must not corrupt its internal state, and should skip the bad message gracefully.

Invalid JSON: "{not json", "", "null", raw binary bytes
Wrong field types: {"type": "text_delta", "seq": "not-a-number"}, {"type": "welcome", "protocolVersion": null}
Missing required fields: {"type": "turn_started"} (no sessionId, turnId, seq, ts)
Extra unknown fields: {"type": "pong", "clientTs": 1, "serverTs": 2, "bonus": true} — must not fail

Request Timeouts

The server never responds. Pending promises/futures must reject with an actionable error, not hang indefinitely.

TypeScript: request() wraps every pending promise in a 10-second setTimeout
Python: asyncio.wait_for(future, timeout=10.0) with TimeoutError propagation
Rust: The application controls timeout semantics (no built-in timeout in the SDK, by design — Tauri apps manage their own lifecycle)

Large Payloads

Events may carry unexpectedly large data — a text_delta with a 1MB string, a file_content with a base64-encoded binary, a tool_call with deeply nested JSON args.

1 MB text field in text_delta — must handle without panic or OOM
Deeply nested JSON in tool_call.args — serde must not stack-overflow
Large sessions array in session_list (thousands of entries) — must parse correctly

Unicode Edge Cases

Text fields may contain any valid Unicode, including sequences that commonly break naive string handling:

Emoji: \u{1F680} (multi-byte UTF-8)
CJK: \u4e16\u754c (Chinese characters)
RTL: Arabic and Hebrew text
Combining characters: e\u0301 (e + combining acute accent)
Zero-width characters: \u200b, \u200d
Surrogate pairs and supplementary plane characters

All must round-trip through serialize/deserialize without corruption.

Numeric Edge Cases

Sequence numbers, timestamps, and token counts are integers that may reach extreme values:

seq: 0, seq: -1, seq: 9007199254740991 (JS MAX_SAFE_INTEGER)
ts: 0, ts: -1
i64::MAX and i64::MIN in Rust (serde must not panic)
costMicroDollars: null (nullable fields must be handled)
inputTokens: null, outputTokens: null (SDK-specific null handling)

Unknown Event Types

The protocol is designed for forward compatibility. When the gateway adds new event types, existing clients must not crash.

Rust: The ServerEvent enum includes #[serde(other)] Unknown — any unrecognized type value deserializes to ServerEvent::Unknown instead of returning a deserialization error
TypeScript: The wildcard handler client.on("*", handler) receives all events regardless of type. Unrecognized types are dispatched through the handler system normally — they just have no type-specific handler registered
Python: The ServerEvent dataclass wraps any event — the type field is a plain string, so unknown types are represented naturally. The wildcard client.on("*", handler) catches them
Swift: The ServerEvent enum includes a .unknown(type:data:) case — any unrecognized type value decodes cleanly instead of throwing a DecodingError

Concurrent Operations

Multiple requests in flight simultaneously must resolve to their correct responses, not cross-contaminate.

TypeScript: Request chaining by type:sessionId key ensures sequential resolution for same-target operations, while different targets resolve independently
Python: Multiple pending futures with predicate-based matching — each response is matched to the correct request by its predicate, not by arrival order
Rust: The channel-based architecture naturally handles concurrency — events arrive in order on the channel, and the application’s event loop dispatches them

Anti-Patterns Avoided

The test suites deliberately avoid common testing anti-patterns that inflate test counts without providing evidence of correctness:

Anti-Pattern	Why It’s Avoided
Testing defaults	Asserting that `ClientOptions` has default values tells you nothing about whether the client works. Defaults are implementation details, not observable behavior.
Constructor-existence tests	`expect(new DiminuendoClient(opts)).toBeDefined()` provides zero evidence that the client can connect, authenticate, or handle events.
Type-only checks	`expect(typeof event.type).toBe("string")` passes for any string, including wrong ones. Assert the specific expected value.
Trivial round-trips	Serializing `{a: 1}` and checking it deserializes to `{a: 1}` only tests the JSON library, not the application. Tests use realistic wire payloads.
Snapshot tests	Snapshot-based assertions are fragile (break on formatting changes) and opaque (reviewers cannot see what’s being asserted). All assertions are explicit.
Mocking the thing under test	Tests never mock the SDK client itself. They mock the server and test the client’s behavior against that mock.

Coverage Statistics

SDK	Tests	Assertions	Message Types	Event Types	Error Types
TypeScript	137	~600	21/21	51/51	All
Rust	178	~800	21/21	51/51	All
Python	173	~800	21/21	51/51	All
Swift	193	~900	21/21	51/51	All
Total	681	~3,100	21/21	51/51	All

The gateway itself has an additional 9 integration tests that validate the server-side protocol handling, session lifecycle, event mapping, and error paths against an in-process gateway instance with SQLite in-memory databases and mock Podium connections.

Running the Tests

Gateway (9 tests)

bun test

Runs against an in-process gateway with no external dependencies. Uses in-memory SQLite.

TypeScript SDK (137 tests)

cd sdk/typescript && bun test

Starts a mock Bun.serve WebSocket gateway, runs all tests, and tears down the server.

Rust SDK (178 tests)

cd sdk/rust && cargo test

All tests are unit tests — no network I/O required. Serde round-trip tests run in-process.

Python SDK (173 tests)

cd sdk/python && python -m pytest tests/ -v

Starts a mock websockets.serve gateway in the asyncio event loop alongside the tests.

Swift SDK (193 tests)

cd sdk/swift && swift test

Uses Swift Testing framework with Codable round-trip and adversarial JSON tests.

Run all tests across all SDKs in a single command:

bun test && \
(cd sdk/typescript && bun test) && \
(cd sdk/rust && cargo test) && \
(cd sdk/python && python -m pytest tests/ -v) && \
(cd sdk/swift && swift test)

This is the full test suite that must pass before any code is merged.

Getting Started

Architecture

Wire Protocol

Client SDKs

Frontend Clients

Platform Integration

Operations

Testing Methodology

Testing Methodology

Philosophy: Severe Testing

Tests are Adversarial

Claims are Falsifiable

No Confirmation Bias

Mock Oracles, Not Mock Implementations

Test Architecture

TypeScript: Mock Gateway with Bun.serve

Rust: Serde Round-Trip and Adversarial JSON

Python: Mock websockets.serve Gateway

Swift: Codable Round-Trip and AsyncStream Delivery

What Makes Tests Severe

Anti-Patterns Avoided

Coverage Statistics

Running the Tests

Getting Started

Architecture

Wire Protocol

Client SDKs

Frontend Clients

Platform Integration

Operations

​Testing Methodology

​Philosophy: Severe Testing

Tests are Adversarial

Claims are Falsifiable

No Confirmation Bias

Mock Oracles, Not Mock Implementations

​Test Architecture

​TypeScript: Mock Gateway with Bun.serve

​Rust: Serde Round-Trip and Adversarial JSON

​Python: Mock websockets.serve Gateway

​Swift: Codable Round-Trip and AsyncStream Delivery

​What Makes Tests Severe

​Anti-Patterns Avoided

​Coverage Statistics

​Running the Tests

Testing Methodology

Philosophy: Severe Testing

Test Architecture

TypeScript: Mock Gateway with Bun.serve

Rust: Serde Round-Trip and Adversarial JSON

Python: Mock websockets.serve Gateway

Swift: Codable Round-Trip and AsyncStream Delivery

What Makes Tests Severe

Anti-Patterns Avoided

Coverage Statistics

Running the Tests