Field notes on building voice & chat agents.
Honest writeups on latency, evaluation, prompt design, and the surprisingly squishy parts of putting AI in front of real customers.
How we got voice latency under 300ms — and what we still got wrong
An honest breakdown of the latency budget for production voice agents: STT, VAD, reasoning, tool calls, TTS — what helps, what doesn't, and the part everyone gets wrong.
Open-sourcing our voice evaluation harness
Why benchmarks lie, what we actually measure, and the harness we wrote to keep ourselves honest.
The acknowledgement problem in voice agents
Why agents need to narrate tool calls — and how to do it without sounding scripted.
How Northwind cut tier-1 support cost by 94%
Full case study with metrics, prompt structure, and the migration timeline.
Long-term memory without leaking secrets
How we scope memory per-user, per-bot, per-workspace — and the audit trail that comes with it.
Hybrid retrieval ships in Hania v2.4
Semantic + BM25 + cross-encoder reranking. Recall up 6pp on internal benchmarks.
No-code tools for non-engineers
Why the next wave of agent platforms will be built by ops teams, not just engineers.