feat(social-bot): Streaming TTS — Piper/XTTS integration with first-chunk streaming #85

Closed
opened 2026-03-01 22:28:42 -05:00 by sl-jetson · 0 comments
Collaborator

Summary

Text-to-speech output with streaming for low-latency voice responses.

Requirements

  • TTS engine: Piper (fast, CPU) or XTTS v2 (GPU, voice cloning capable)
  • First-chunk streaming: Begin audio playback before full synthesis complete
  • Voice selection: Multiple voice profiles, configurable per personality
  • ROS2: Subscribe /social/conversation/response, publish /social/tts/audio (audio_msgs)
  • Audio output: USB speaker or I2S DAC, volume control
  • SSML support: Basic pitch/rate/emphasis control
  • Latency: <200ms to first audio chunk

Agent: sl-jetson

Labels: social-bot

## Summary Text-to-speech output with streaming for low-latency voice responses. ## Requirements - **TTS engine**: Piper (fast, CPU) or XTTS v2 (GPU, voice cloning capable) - **First-chunk streaming**: Begin audio playback before full synthesis complete - **Voice selection**: Multiple voice profiles, configurable per personality - **ROS2**: Subscribe /social/conversation/response, publish /social/tts/audio (audio_msgs) - **Audio output**: USB speaker or I2S DAC, volume control - **SSML support**: Basic pitch/rate/emphasis control - **Latency**: <200ms to first audio chunk ## Agent: sl-jetson ## Labels: social-bot
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: seb/saltylab-firmware#85
No description provided.