feat(social): audio wake-word detector 'hey salty' (Issue #320) #317

Merged
sl-jetson merged 1 commits from sl-jetson/wake-word-detect into main 2026-03-03 00:41:23 -05:00
Collaborator

Summary

New ROS2 node: wake_word_node — detects the wake phrase 'hey salty' from raw audio using energy gating + log-mel cosine-similarity template matching.

  • Subscribes /social/speech/audio_raw (UInt8MultiArray, PCM-16 LE mono — same feed as vad_node)
  • Publishes Bool(True) on /saltybot/wake_word_detected as a one-shot event on each detection
  • Pipeline: PCM-16 decode → AudioRingBuffer (1.5 s sliding window) → RMS energy gate → log-mel spectrogram → cosine similarity vs stored template → cooldown guard
  • Template: loaded from a .npy file (template_path param); passive mode (no detections) when path is empty. To enrol: record a few utterances of 'hey salty', compute averaged log-mel, save as numpy array.
  • Parameters: energy_threshold=0.02, match_threshold=0.82, cooldown_s=2.0, window_s=1.5, hop_s=0.1, sample_rate=16000, n_fft=512, n_mels=40
  • Pure-numpy DSP — no ML framework dependency

Test plan

  • 91/91 offline unit tests pass (test_wake_word.py)
  • pcm16_to_float: zeros, positive, negative, empty, odd-byte
  • rms: zero signal, constant, sine (amp/√2), empty
  • mel_filterbank: shape, non-negative, rows sum positive
  • compute_log_mel: shape, finite, silence, short signal
  • cosine_sim: identical, orthogonal, opposite, zero, 2D, truncation, range
  • AudioRingBuffer: push, get_window, eviction, exact size
  • WakeWordDetector: no-template passive, energy gate, identical-signal sim≈1, white-noise low-sim, threshold boundaries, has_template
  • Node init: pub/sub/timer, custom topics, bad template path warns, window_n
  • _on_audio: buffer growth, multiple chunks, bad data no crash
  • _detection_cb: no data, insufficient buffer, detects+publishes, cooldown, silence
  • Source tags, config YAML, launch file, setup.py entry point

🤖 Generated with Claude Code

## Summary New ROS2 node: `wake_word_node` — detects the wake phrase **'hey salty'** from raw audio using energy gating + log-mel cosine-similarity template matching. - Subscribes `/social/speech/audio_raw` (`UInt8MultiArray`, PCM-16 LE mono — same feed as `vad_node`) - Publishes `Bool(True)` on `/saltybot/wake_word_detected` as a one-shot event on each detection - **Pipeline**: PCM-16 decode → `AudioRingBuffer` (1.5 s sliding window) → RMS energy gate → log-mel spectrogram → cosine similarity vs stored template → cooldown guard - **Template**: loaded from a `.npy` file (`template_path` param); passive mode (no detections) when path is empty. To enrol: record a few utterances of 'hey salty', compute averaged log-mel, save as numpy array. - **Parameters**: `energy_threshold=0.02`, `match_threshold=0.82`, `cooldown_s=2.0`, `window_s=1.5`, `hop_s=0.1`, `sample_rate=16000`, `n_fft=512`, `n_mels=40` - Pure-numpy DSP — no ML framework dependency ## Test plan - [x] 91/91 offline unit tests pass (`test_wake_word.py`) - [x] `pcm16_to_float`: zeros, positive, negative, empty, odd-byte - [x] `rms`: zero signal, constant, sine (amp/√2), empty - [x] `mel_filterbank`: shape, non-negative, rows sum positive - [x] `compute_log_mel`: shape, finite, silence, short signal - [x] `cosine_sim`: identical, orthogonal, opposite, zero, 2D, truncation, range - [x] `AudioRingBuffer`: push, get_window, eviction, exact size - [x] `WakeWordDetector`: no-template passive, energy gate, identical-signal sim≈1, white-noise low-sim, threshold boundaries, has_template - [x] Node init: pub/sub/timer, custom topics, bad template path warns, window_n - [x] `_on_audio`: buffer growth, multiple chunks, bad data no crash - [x] `_detection_cb`: no data, insufficient buffer, detects+publishes, cooldown, silence - [x] Source tags, config YAML, launch file, setup.py entry point 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sl-webui added 1 commit 2026-03-03 00:27:28 -05:00
feat(social): audio wake-word detector 'hey salty' (Issue #320)
Some checks failed
social-bot integration tests / Lint (flake8 + pep257) (push) Failing after 2s
social-bot integration tests / Core integration tests (mock sensors, no GPU) (push) Has been skipped
social-bot integration tests / Lint (flake8 + pep257) (pull_request) Failing after 10s
social-bot integration tests / Core integration tests (mock sensors, no GPU) (pull_request) Has been skipped
social-bot integration tests / Latency profiling (GPU, Orin) (push) Has been cancelled
social-bot integration tests / Latency profiling (GPU, Orin) (pull_request) Has been cancelled
d6553ce3d6
Energy-gated log-mel + cosine-similarity wake-word node. Subscribes to
/social/speech/audio_raw (PCM-16 UInt8MultiArray), maintains a 1.5 s
sliding ring buffer, runs detection every 100 ms; fires Bool(True) on
/saltybot/wake_word_detected with 2 s cooldown. Template loaded from
.npy file; passive (no detections) when template_path is empty.
91/91 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sl-jetson merged commit b96c6b96d0 into main 2026-03-03 00:41:23 -05:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: seb/saltylab-firmware#317
No description provided.