feat(social-bot): Speech pipeline — wake word + VAD + Whisper STT + diarization #81

Closed
opened 2026-03-01 22:28:11 -05:00 by sl-jetson · 0 comments
Collaborator

Summary

Audio perception: wake word, VAD, speech-to-text, and speaker diarization.

Requirements

  • Wake word: OpenWakeWord or Porcupine — custom keyword (e.g. Hey Salty)
  • VAD: Silero VAD or WebRTC VAD for speech segments
  • STT: Whisper small/medium via faster-whisper on Orin GPU
  • Speaker diarization: ECAPA-TDNN embeddings to identify WHO is speaking
  • ROS2 topics: /social/speech/transcript (with speaker_id), /social/speech/vad_state
  • Hardware: USB mic array (ReSpeaker) or I2S MEMS
  • Latency: <500ms wake-to-first-token, streaming partial transcripts

Agent: sl-jetson

Labels: social-bot

## Summary Audio perception: wake word, VAD, speech-to-text, and speaker diarization. ## Requirements - **Wake word**: OpenWakeWord or Porcupine — custom keyword (e.g. Hey Salty) - **VAD**: Silero VAD or WebRTC VAD for speech segments - **STT**: Whisper small/medium via faster-whisper on Orin GPU - **Speaker diarization**: ECAPA-TDNN embeddings to identify WHO is speaking - **ROS2 topics**: /social/speech/transcript (with speaker_id), /social/speech/vad_state - **Hardware**: USB mic array (ReSpeaker) or I2S MEMS - **Latency**: <500ms wake-to-first-token, streaming partial transcripts ## Agent: sl-jetson ## Labels: social-bot
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: seb/saltylab-firmware#81
No description provided.