Observability
Diminuendo provides three observability pillars: structured logging via Effect’s built-in logger, distributed tracing via OpenTelemetry, and deep health checks that probe upstream dependencies. Each is designed for zero-configuration local development and opt-in production instrumentation.Logging
Diminuendo uses Effect’s built-in logging system, which integrates directly with the Effect runtime’s fiber scheduler. EveryEffect.log* call captures the current fiber’s context (span, annotations) and routes through the configured logger implementation.
Logger Configuration
The logger is configured by two environment variables:| Variable | Effect |
|---|---|
LOG_LEVEL | Minimum severity: trace, debug, info, warning, error, fatal. Default: info |
DEV_MODE / NODE_ENV | Format selection: pretty-print in dev, JSON in production |
Production: JSON Logger
In production (NODE_ENV=production or DEV_MODE not set), logs are emitted as structured JSON, one object per line. This format is optimized for ingestion by log aggregators (Datadog, Grafana Loki, CloudWatch Logs):
Development: Pretty Logger
In development, logs use Effect’sprettyLoggerDefault, which renders human-readable output with color coding:
Log Level Recommendations
| Level | Use Case |
|---|---|
error | Unrecoverable failures, data corruption, service crashes |
warning | Recoverable issues: stale session recovery failures, missing optional config, degraded dependencies |
info | Service lifecycle events: startup, shutdown, configuration summary, connection events |
debug | Request/response details: Podium API calls, WebSocket frame details, SQL queries |
trace | Fiber scheduling, Effect runtime internals (rarely needed) |
OpenTelemetry Tracing
Distributed tracing is opt-in. SetOTEL_EXPORTER_OTLP_ENDPOINT to enable it. If the variable is unset, the tracing subsystem is completely inert — no spans are created, no overhead is incurred.
Initialization
Tracing is initialized once at startup viainitTracing(). The function is idempotent and safe to call multiple times:
@opentelemetry/api@opentelemetry/sdk-trace-node@opentelemetry/exporter-trace-otlp-http@opentelemetry/sdk-trace-base
The OpenTelemetry packages are optional dependencies. If they are not installed,
initTracing() catches the import error and silently disables tracing. The gateway runs identically with or without these packages in node_modules.Configuration
| Variable | Default | Description |
|---|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT | (none) | OTLP HTTP endpoint (e.g., http://localhost:4318) |
OTEL_SERVICE_NAME | diminuendo-gateway | Service name in trace metadata |
{OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces using the OTLP HTTP protocol. A BatchSpanProcessor batches spans for efficient network transmission.
withSpan() Helper
ThewithSpan() function wraps any Effect in an OpenTelemetry span. If tracing is disabled, it passes the Effect through unchanged (zero overhead):
- On success: span status is set to
OKand the span is ended - On failure or interruption: span status is set to
ERRORwith a diagnostic message, and the span is ended
Trace ID Propagation
ThecurrentTraceId() function returns the active span’s trace ID if OTel is enabled, or a random 32-character hex string otherwise. This ID is propagated through event envelopes, enabling correlation between client-visible events and server-side traces:
Graceful Degradation
The tracing subsystem is designed for complete graceful degradation:| Condition | Behavior |
|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT not set | Tracing disabled; withSpan() is a pass-through |
| OTel packages not installed | initTracing() catches import error; tracing disabled |
| Collector unreachable | BatchSpanProcessor buffers and retries; no impact on gateway |
initTracing() called multiple times | Idempotent; second call is a no-op |
Health Endpoint
The gateway exposes aGET /health endpoint that performs deep health checks against upstream dependencies.
Response Format
Health Check Logic
The endpoint probes each configured upstream service by sending aGET request to {service_url}/health with a 2-second timeout:
1
Probe Dependencies
Podium and Ensemble (if configured) are probed in parallel. Each probe measures latency and captures the HTTP status.
2
Classify Each Dependency
200 OKwith latency under timeout: ok- Non-200 HTTP status: degraded (with error detail)
- Timeout or connection error: unhealthy (with error message)
3
Compute Overall Status
- If Podium is unhealthy: overall status is unhealthy (Podium is critical)
- If any dependency is not ok but Podium is available: overall status is degraded
- If all dependencies are ok: overall status is ok
4
Return Response
200forokordegradedstatus503forunhealthystatus
Response Fields
| Field | Type | Description |
|---|---|---|
status | "ok" | "degraded" | "unhealthy" | Overall gateway health |
uptime | number | Milliseconds since gateway started |
connections | number | Number of active session subscriptions |
dependencies | DependencyStatus[] | Per-dependency health details |
version | string | Gateway version |
Dependency Criticality
Podium is the only critical dependency. If Podium is unreachable, the gateway cannot create or manage agent sessions, so the overall status isunhealthy (503). Ensemble is non-critical — if it is unreachable, the gateway reports degraded (200) because agent sessions can still function without gateway-level inference.
Load Balancer Integration
Configure your load balancer to probeGET /health periodically:
degraded status should remain in the pool — it can still serve requests, but operators should investigate the degraded dependency.