feat: Audio pipeline — wake word + STT + TTS on Jabra SPEAK 810 (Issue #503) #543

sl-jetson · 2026-03-07T10:03:53-05:00

sl-jetson commented

2026-03-07 10:03:53 -05:00

Summary

Implements end-to-end audio pipeline for SaltyBot on Jetson (Issue #503).

Jabra SPEAK 810 USB speakerphone: auto-discovered via PyAudio by name
openWakeWord: "hey_salty" wake word detection, configurable threshold
Vosk STT (new): offline KaldiRecognizer backend — low-latency CPU alternative; add VoskSTT class to audio_utils.py
Whisper STT: faster-whisper with CUDA/CPU auto-detection and int8 fallback
Piper TTS: offline ONNX speech synthesis, sample-rate resampling to Jabra output
MQTT reporting: state + metrics published to saltybot/audio/{state,status}
Full-stack integration: audio pipeline at t=5s in full_stack.launch.py with enable_audio_pipeline and audio_stt_backend args

Files changed

saltybot_audio_pipeline/audio_utils.py — add VoskSTT class
saltybot_audio_pipeline/audio_pipeline_node.py — stt_backend param (whisper/vosk), Vosk loading, CPU fallback
saltybot_audio_pipeline/config/audio_pipeline_params.yaml — add stt_backend, vosk_model_path
saltybot_audio_pipeline/test/test_audio_pipeline.py — 40 unit tests
saltybot_bringup/launch/full_stack.launch.py — audio pipeline at t=5s

Test plan

pytest jetson/ros2_ws/src/saltybot_audio_pipeline/test/ — all 40 tests pass offline
ros2 launch saltybot_audio_pipeline audio_pipeline.launch.py — verifies Jabra device opens
ros2 launch saltybot_bringup full_stack.launch.py audio_stt_backend:=vosk — Vosk backend
ros2 topic echo /saltybot/speech/transcribed_text — STT output after "hey salty"
ros2 topic echo /saltybot/audio/state — FSM transitions: idle→listening→wake_detected→processing→speaking→listening

🤖 Generated with Claude Code

## Summary Implements end-to-end audio pipeline for SaltyBot on Jetson (Issue #503). - **Jabra SPEAK 810** USB speakerphone: auto-discovered via PyAudio by name - **openWakeWord**: "hey_salty" wake word detection, configurable threshold - **Vosk STT** (new): offline KaldiRecognizer backend — low-latency CPU alternative; add `VoskSTT` class to `audio_utils.py` - **Whisper STT**: faster-whisper with CUDA/CPU auto-detection and int8 fallback - **Piper TTS**: offline ONNX speech synthesis, sample-rate resampling to Jabra output - **MQTT reporting**: state + metrics published to `saltybot/audio/{state,status}` - **Full-stack integration**: audio pipeline at t=5s in `full_stack.launch.py` with `enable_audio_pipeline` and `audio_stt_backend` args ## Files changed - `saltybot_audio_pipeline/audio_utils.py` — add `VoskSTT` class - `saltybot_audio_pipeline/audio_pipeline_node.py` — `stt_backend` param (whisper/vosk), Vosk loading, CPU fallback - `saltybot_audio_pipeline/config/audio_pipeline_params.yaml` — add `stt_backend`, `vosk_model_path` - `saltybot_audio_pipeline/test/test_audio_pipeline.py` — 40 unit tests - `saltybot_bringup/launch/full_stack.launch.py` — audio pipeline at t=5s ## Test plan - [ ] `pytest jetson/ros2_ws/src/saltybot_audio_pipeline/test/` — all 40 tests pass offline - [ ] `ros2 launch saltybot_audio_pipeline audio_pipeline.launch.py` — verifies Jabra device opens - [ ] `ros2 launch saltybot_bringup full_stack.launch.py audio_stt_backend:=vosk` — Vosk backend - [ ] `ros2 topic echo /saltybot/speech/transcribed_text` — STT output after "hey salty" - [ ] `ros2 topic echo /saltybot/audio/state` — FSM transitions: idle→listening→wake_detected→processing→speaking→listening 🤖 Generated with [Claude Code](https://claude.com/claude-code)

sl-jetson added 1 commit 2026-03-07 10:03:54 -05:00

feat: Audio pipeline end-to-end (Issue #503 ) 14164089dc

- Add VoskSTT class to audio_utils.py: offline Vosk STT backend as
  low-latency CPU alternative to Whisper for Jetson deployments
- Update audio_pipeline_node.py: stt_backend param ("whisper"/"vosk"),
  Vosk loading with Whisper fallback, CPU auto-detection for Whisper,
  dual-backend _process_utterance dispatch, STT/<backend> log prefix
- Update audio_pipeline_params.yaml: add stt_backend and vosk_model_path
- Add test/test_audio_pipeline.py: 40 unit tests covering EnergyVAD,
  PCM conversion, AudioBuffer, UtteranceSegmenter, VoskSTT, JabraAudioDevice,
  AudioMetrics, AudioState
- Integrate into full_stack.launch.py: audio_pipeline at t=5s with
  enable_audio_pipeline and audio_stt_backend args

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sl-jetson merged commit 6e09d13dfc into main

2026-03-07 10:07:17 -05:00

sl-jetson referenced this issue from a commit

2026-03-07 10:07:19 -05:00

Merge pull request 'feat: Audio pipeline — wake word + STT + TTS on Jabra SPEAK 810 (Issue #503)' (#543) from sl-jetson/issue-503-audio-pipeline into main

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: seb/saltylab-firmware#543