Stream Coalescing
Stream Coalescing
Section titled “Stream Coalescing”LLM replies stream one token at a time. Edit-per-token is murder for chat apps — you’d hit rate limits in seconds and the conversation would shimmer like a stock ticker. The StreamCoalescer batches token deltas into a small number of message edits.
How it works
Section titled “How it works”Agent emits token deltas ↓Coalescer accumulates → flush window: gateway.coalesceMs (default 800ms) cap: ≤3 edits per ~250ms burst ↓Platform message edit → user sees the reply grow in chunks ↓Final-message flush → immediate; the closing edit happens with no delayDefaults
Section titled “Defaults”| Knob | Default | What it does |
|---|---|---|
gateway.coalesceMs | 800 | Time window per coalesced edit. Lower = more edits, more rate-limit risk |
| Per-burst edit cap | 3 | Hard ceiling per ~250ms — keeps you under Telegram/Slack/Discord rate limits |
Tuning
Section titled “Tuning”- Slow chats: bump
coalesceMsto1500for less flicker on fast generations - Latency-sensitive: drop to
400to feel snappier (still safe under platform limits) - Final flush is always immediate regardless of
coalesceMs— once the agent says it’s done, you see the final message right away