Testing Methodology
The Diminuendo test suites do not exist to demonstrate that the code works. They exist to demonstrate that aggressive, systematic attempts to break the code have failed. This distinction — between confirmation and refutation — is the foundation of the testing methodology.Philosophy: Severe Testing
The testing approach draws from Karl Popper’s philosophy of science: a claim is only as credible as the severity of the tests it has survived. A test is severe when it has a high probability of detecting a defect if one exists. A test that passes trivially — because it checks a tautology, because it uses the same code path as the implementation, because it only exercises the happy path — provides no evidence of correctness. In practice, this means:Tests are Adversarial
Every test is written from the perspective of an attacker trying to break the system. What input would cause a crash? What timing would cause a race condition? What encoding would corrupt deserialization?
Claims are Falsifiable
Each test asserts a specific, falsifiable claim. “The client handles
text_delta events” is not falsifiable. “A text_delta event with a 1MB text field is delivered to the handler without truncation” is falsifiable — and tested.No Confirmation Bias
Tests do not confirm expected behavior by feeding expected inputs and checking expected outputs. They probe boundaries: empty strings, null fields, maximum-size payloads, negative numbers, concurrent mutations.
Mock Oracles, Not Mock Implementations
Each SDK test suite runs against a mock gateway that speaks the actual wire protocol. The mock is an oracle — it knows the correct behavior and validates the client’s conformance — not a stub that returns canned responses.
Test Architecture
Each SDK has a distinct test architecture, adapted to its language’s testing ecosystem, but all share the same structural pattern: a mock gateway server, full-protocol coverage, and adversarial edge cases.TypeScript: Mock Gateway with Bun.serve
The TypeScript test suite runs a realBun.serve WebSocket server that implements the Diminuendo wire protocol. The mock gateway:
- Sends
welcome,connected, andauthenticatedon connection - Responds to all 21 client message types with correctly-typed server events
- Validates inbound message structure
- Simulates streaming event sequences (multiple
text_deltafollowed byturn_complete) - Supports concurrent connections
- Connection lifecycle: connect, authenticate, reconnect, disconnect handling
- Session CRUD: create, list, rename, archive, unarchive, delete — each verified against the mock’s response
- Turn execution:
runTurnfires events,stopTurnstops them,steerinjects mid-turn - File access: list, read, history, iteration retrieval
- Member management: list, set_role, remove with permission checks
- Event handler typing: every event type dispatched through the handler system, verified with type-narrowed assertions
- Wire format: raw JSON assertions for camelCase field names, optional field omission, null handling
Rust: Serde Round-Trip and Adversarial JSON
The Rust test suite relies on serde’s compile-time guarantees plus runtime JSON round-trip verification. Every variant ofClientMessage (21) and ServerEvent (51) is tested for exact wire format conformance.
- ClientMessage serialization (21 tests): Every variant serialized to JSON, every field name verified as camelCase, every optional field verified to be absent when
None - ServerEvent deserialization (51 tests): JSON payloads for every event type parsed into the correct Rust enum variant, every field value extracted and asserted
- Round-trip (selected types): Serialize a
ClientMessage, parse the JSON, verify field-by-field - Unknown event handling: Unrecognized
typevalues deserialize toServerEvent::Unknowninstead of panicking - Adversarial JSON: Malformed JSON, wrong field types, missing required fields, extra fields, negative numbers, i64 overflow, empty strings, Unicode edge cases (emoji, CJK, RTL, combining characters)
Python: Mock websockets.serve Gateway
The Python test suite useswebsockets.serve to run a mock gateway in the same event loop as the tests. The mock implements the protocol handshake and request-response cycle.
- Connection lifecycle: connect, auto-authenticate, disconnect, reconnect semantics
- Session methods: every method tested against mock responses, return types verified
- Dataclass parsing:
SessionMeta.from_dict,FileEntry.from_dict,IterationMeta.from_dict,FileContent.from_dict,StateSnapshot.from_dict— all tested with realistic wire payloads - ServerEvent properties:
session_id,turn_id,text,final_text,error_code,error_message,seq,ts, tool properties, question properties - Category predicates:
is_turn_event,is_tool_event,is_thinking_event,is_terminal_event,is_sandbox_event,is_interactive_event— each tested for correct inclusion and exclusion - Event handler system: registration, unsubscribe, wildcard handler, multiple handlers per event type
Swift: Codable Round-Trip and AsyncStream Delivery
The Swift test suite uses Swift Testing (@Test, #expect) with Codable round-trip verification. All 51 ServerEvent and 21 ClientMessage variants are tested for exact wire format conformance via CodingKeys.
Test categories:
- ClientMessage serialization (21 tests): Every variant encoded to JSON, camelCase field names verified, optional fields omitted when nil
- ServerEvent deserialization (51 tests): JSON payloads for every event type decoded into the correct enum case, field values extracted and asserted
- Round-trip (selected types): Encode a
ClientMessage, decode the JSON, verify field-by-field - Unknown event handling: Unrecognized
typevalues decode to.unknowninstead of throwing - Adversarial JSON: Missing fields, wrong types, 100KB strings, Unicode edge cases, deeply nested objects,
Int64.maxoverflow,AnyCodablecoverage
What Makes Tests Severe
The test suites go well beyond “call method, check result.” Here are the specific adversarial patterns tested across all four SDKs:Malformed Server Data
Malformed Server Data
The server (or a man-in-the-middle) sends garbage. The client must not crash, must not corrupt its internal state, and should skip the bad message gracefully.
- Invalid JSON:
"{not json","","null", raw binary bytes - Wrong field types:
{"type": "text_delta", "seq": "not-a-number"},{"type": "welcome", "protocolVersion": null} - Missing required fields:
{"type": "turn_started"}(nosessionId,turnId,seq,ts) - Extra unknown fields:
{"type": "pong", "clientTs": 1, "serverTs": 2, "bonus": true}— must not fail
Request Timeouts
Request Timeouts
The server never responds. Pending promises/futures must reject with an actionable error, not hang indefinitely.
- TypeScript:
request()wraps every pending promise in a 10-secondsetTimeout - Python:
asyncio.wait_for(future, timeout=10.0)withTimeoutErrorpropagation - Rust: The application controls timeout semantics (no built-in timeout in the SDK, by design — Tauri apps manage their own lifecycle)
Large Payloads
Large Payloads
Events may carry unexpectedly large data — a
text_delta with a 1MB string, a file_content with a base64-encoded binary, a tool_call with deeply nested JSON args.- 1 MB
textfield intext_delta— must handle without panic or OOM - Deeply nested JSON in
tool_call.args— serde must not stack-overflow - Large
sessionsarray insession_list(thousands of entries) — must parse correctly
Unicode Edge Cases
Unicode Edge Cases
Text fields may contain any valid Unicode, including sequences that commonly break naive string handling:
- Emoji:
\u{1F680}(multi-byte UTF-8) - CJK:
\u4e16\u754c(Chinese characters) - RTL: Arabic and Hebrew text
- Combining characters:
e\u0301(e + combining acute accent) - Zero-width characters:
\u200b,\u200d - Surrogate pairs and supplementary plane characters
Numeric Edge Cases
Numeric Edge Cases
Sequence numbers, timestamps, and token counts are integers that may reach extreme values:
seq: 0,seq: -1,seq: 9007199254740991(JS MAX_SAFE_INTEGER)ts: 0,ts: -1i64::MAXandi64::MINin Rust (serde must not panic)costMicroDollars: null(nullable fields must be handled)inputTokens: null,outputTokens: null(SDK-specific null handling)
Unknown Event Types
Unknown Event Types
The protocol is designed for forward compatibility. When the gateway adds new event types, existing clients must not crash.
- Rust: The
ServerEventenum includes#[serde(other)] Unknown— any unrecognizedtypevalue deserializes toServerEvent::Unknowninstead of returning a deserialization error - TypeScript: The wildcard handler
client.on("*", handler)receives all events regardless of type. Unrecognized types are dispatched through the handler system normally — they just have no type-specific handler registered - Python: The
ServerEventdataclass wraps any event — thetypefield is a plain string, so unknown types are represented naturally. The wildcardclient.on("*", handler)catches them - Swift: The
ServerEventenum includes a.unknown(type:data:)case — any unrecognizedtypevalue decodes cleanly instead of throwing aDecodingError
Concurrent Operations
Concurrent Operations
Multiple requests in flight simultaneously must resolve to their correct responses, not cross-contaminate.
- TypeScript: Request chaining by
type:sessionIdkey ensures sequential resolution for same-target operations, while different targets resolve independently - Python: Multiple pending futures with predicate-based matching — each response is matched to the correct request by its predicate, not by arrival order
- Rust: The channel-based architecture naturally handles concurrency — events arrive in order on the channel, and the application’s event loop dispatches them
Anti-Patterns Avoided
The test suites deliberately avoid common testing anti-patterns that inflate test counts without providing evidence of correctness:| Anti-Pattern | Why It’s Avoided |
|---|---|
| Testing defaults | Asserting that ClientOptions has default values tells you nothing about whether the client works. Defaults are implementation details, not observable behavior. |
| Constructor-existence tests | expect(new DiminuendoClient(opts)).toBeDefined() provides zero evidence that the client can connect, authenticate, or handle events. |
| Type-only checks | expect(typeof event.type).toBe("string") passes for any string, including wrong ones. Assert the specific expected value. |
| Trivial round-trips | Serializing {a: 1} and checking it deserializes to {a: 1} only tests the JSON library, not the application. Tests use realistic wire payloads. |
| Snapshot tests | Snapshot-based assertions are fragile (break on formatting changes) and opaque (reviewers cannot see what’s being asserted). All assertions are explicit. |
| Mocking the thing under test | Tests never mock the SDK client itself. They mock the server and test the client’s behavior against that mock. |
Coverage Statistics
| SDK | Tests | Assertions | Message Types | Event Types | Error Types |
|---|---|---|---|---|---|
| TypeScript | 137 | ~600 | 21/21 | 51/51 | All |
| Rust | 178 | ~800 | 21/21 | 51/51 | All |
| Python | 173 | ~800 | 21/21 | 51/51 | All |
| Swift | 193 | ~900 | 21/21 | 51/51 | All |
| Total | 681 | ~3,100 | 21/21 | 51/51 | All |
Running the Tests
1
Gateway (9 tests)
2
TypeScript SDK (137 tests)
3
Rust SDK (178 tests)
4
Python SDK (173 tests)
5
Swift SDK (193 tests)