Role of partial results
We enable shouldReportPartialResults on the speech request so latestTranscription updates continuously. That gives responsive UI text but would spam the LLM if we forwarded every delta. Instead, we treat partials as activity that resets a silence window, not as direct model input.
Timer ladder
When transcription content changes, we invalidate prior timers and schedule three staggered one-shot timers with defaults near half a second for LLM start, longer for TTS staging, and a final gate before playout. Remote configuration can adjust those intervals without shipping a new binary—useful when marketing wants snappier replies or engineering needs fewer partial-triggered cancellations.
The ladder effectively approximates turn detection: the user’s pause must outlast the LLM timer before we commit text and call upstream. This is not a neural endpointing model; it is honest, tunable heuristics paired with Apple’s recognizer.
Interaction with speaking state
While the assistant is speaking, recognition is suppressed. Timer logic and interruption handling must agree on when to re-enter listening; mismatches here show up as “double answers” or missed user input in QA.
Trade-offs
Shorter timers feel faster but increase false triggers on thinking pauses. Longer timers feel sluggish. Expose sane bounds in internal tooling and log timer fires when debugging production clips.
Relation to WebRTC
When assistant audio routes through WebRTC injection, the same timer and partial-result layer still gates the LLM; only the playback backend changes. Keeping turn-taking above the transport layer preserved our sanity across migrations visible in git history.
Closing
For teams shipping voice without a dedicated cloud endpointing API, staggered timers plus partial results are a maintainable baseline—provided you document the defaults and test across Bluetooth and speaker routes.