feat: Audio pipeline — wake word + STT + TTS on Jabra SPEAK 810 (Issue #503) #543

Merged
sl-jetson merged 1 commits from sl-jetson/issue-503-audio-pipeline into main 2026-03-07 10:07:17 -05:00
Collaborator

Summary

Implements end-to-end audio pipeline for SaltyBot on Jetson (Issue #503).

  • Jabra SPEAK 810 USB speakerphone: auto-discovered via PyAudio by name
  • openWakeWord: "hey_salty" wake word detection, configurable threshold
  • Vosk STT (new): offline KaldiRecognizer backend — low-latency CPU alternative; add VoskSTT class to audio_utils.py
  • Whisper STT: faster-whisper with CUDA/CPU auto-detection and int8 fallback
  • Piper TTS: offline ONNX speech synthesis, sample-rate resampling to Jabra output
  • MQTT reporting: state + metrics published to saltybot/audio/{state,status}
  • Full-stack integration: audio pipeline at t=5s in full_stack.launch.py with enable_audio_pipeline and audio_stt_backend args

Files changed

  • saltybot_audio_pipeline/audio_utils.py — add VoskSTT class
  • saltybot_audio_pipeline/audio_pipeline_node.pystt_backend param (whisper/vosk), Vosk loading, CPU fallback
  • saltybot_audio_pipeline/config/audio_pipeline_params.yaml — add stt_backend, vosk_model_path
  • saltybot_audio_pipeline/test/test_audio_pipeline.py — 40 unit tests
  • saltybot_bringup/launch/full_stack.launch.py — audio pipeline at t=5s

Test plan

  • pytest jetson/ros2_ws/src/saltybot_audio_pipeline/test/ — all 40 tests pass offline
  • ros2 launch saltybot_audio_pipeline audio_pipeline.launch.py — verifies Jabra device opens
  • ros2 launch saltybot_bringup full_stack.launch.py audio_stt_backend:=vosk — Vosk backend
  • ros2 topic echo /saltybot/speech/transcribed_text — STT output after "hey salty"
  • ros2 topic echo /saltybot/audio/state — FSM transitions: idle→listening→wake_detected→processing→speaking→listening

🤖 Generated with Claude Code

## Summary Implements end-to-end audio pipeline for SaltyBot on Jetson (Issue #503). - **Jabra SPEAK 810** USB speakerphone: auto-discovered via PyAudio by name - **openWakeWord**: "hey_salty" wake word detection, configurable threshold - **Vosk STT** (new): offline KaldiRecognizer backend — low-latency CPU alternative; add `VoskSTT` class to `audio_utils.py` - **Whisper STT**: faster-whisper with CUDA/CPU auto-detection and int8 fallback - **Piper TTS**: offline ONNX speech synthesis, sample-rate resampling to Jabra output - **MQTT reporting**: state + metrics published to `saltybot/audio/{state,status}` - **Full-stack integration**: audio pipeline at t=5s in `full_stack.launch.py` with `enable_audio_pipeline` and `audio_stt_backend` args ## Files changed - `saltybot_audio_pipeline/audio_utils.py` — add `VoskSTT` class - `saltybot_audio_pipeline/audio_pipeline_node.py` — `stt_backend` param (whisper/vosk), Vosk loading, CPU fallback - `saltybot_audio_pipeline/config/audio_pipeline_params.yaml` — add `stt_backend`, `vosk_model_path` - `saltybot_audio_pipeline/test/test_audio_pipeline.py` — 40 unit tests - `saltybot_bringup/launch/full_stack.launch.py` — audio pipeline at t=5s ## Test plan - [ ] `pytest jetson/ros2_ws/src/saltybot_audio_pipeline/test/` — all 40 tests pass offline - [ ] `ros2 launch saltybot_audio_pipeline audio_pipeline.launch.py` — verifies Jabra device opens - [ ] `ros2 launch saltybot_bringup full_stack.launch.py audio_stt_backend:=vosk` — Vosk backend - [ ] `ros2 topic echo /saltybot/speech/transcribed_text` — STT output after "hey salty" - [ ] `ros2 topic echo /saltybot/audio/state` — FSM transitions: idle→listening→wake_detected→processing→speaking→listening 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sl-jetson added 1 commit 2026-03-07 10:03:54 -05:00
- Add VoskSTT class to audio_utils.py: offline Vosk STT backend as
  low-latency CPU alternative to Whisper for Jetson deployments
- Update audio_pipeline_node.py: stt_backend param ("whisper"/"vosk"),
  Vosk loading with Whisper fallback, CPU auto-detection for Whisper,
  dual-backend _process_utterance dispatch, STT/<backend> log prefix
- Update audio_pipeline_params.yaml: add stt_backend and vosk_model_path
- Add test/test_audio_pipeline.py: 40 unit tests covering EnergyVAD,
  PCM conversion, AudioBuffer, UtteranceSegmenter, VoskSTT, JabraAudioDevice,
  AudioMetrics, AudioState
- Integrate into full_stack.launch.py: audio_pipeline at t=5s with
  enable_audio_pipeline and audio_stt_backend args

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sl-jetson merged commit 6e09d13dfc into main 2026-03-07 10:07:17 -05:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: seb/saltylab-firmware#543
No description provided.