feat(social): energy+ZCR voice activity detection node (Issue #242) #247

Merged
sl-jetson merged 1 commits from sl-jetson/issue-242-vad into main 2026-03-02 12:46:26 -05:00
Collaborator

Summary

New vad_node in saltybot_social:

  • Subscribes to /social/speech/audio_raw (UInt8MultiArray PCM-16 LE)
  • Computes RMS energy (dBFS) and zero-crossing rate (ZCR) per chunk
  • Combined decision: energy_db >= rms_threshold_db AND zcr_min <= zcr <= zcr_max
  • VadStateMachine provides onset/offset hysteresis to prevent chattering
  • Publishes /social/speech/is_speaking (Bool) and /social/speech/energy (Float32 linear RMS)

Parameters

Parameter Default Description
rms_threshold_db -35.0 Energy gate (dBFS)
zcr_min 0.01 ZCR lower bound — rejects DC/rumble
zcr_max 0.40 ZCR upper bound — rejects high-freq noise
onset_frames 2 Consecutive active frames before is_speaking=true
offset_frames 8 Consecutive silent frames before is_speaking=false
audio_topic /social/speech/audio_raw Source PCM-16 topic

ZCR bands (16 kHz)

Signal ZCR range
Silence / low-freq rumble < 0.01
Voiced speech 0.01–0.20
Unvoiced / sibilants 0.20–0.40
High-freq noise > 0.40

Test plan

  • 69/69 tests passing (test_vad_node.py)
  • pcm16_bytes_to_float32: roundtrip, edge cases
  • rms_linear / rms_db: silence, full-scale, sine RMS = A/√2
  • zero_crossing_rate: alternating=1.0, sine in range, higher freq → higher ZCR
  • VadStateMachine: onset/offset hysteresis, reset
  • Combined decision: speech passes, silence/noise rejected, boundary conditions
  • Integration: 300 Hz voiced sine passes; silence rejected
  • Node source structure, config, setup.py entry point

Closes #242

🤖 Generated with Claude Code

## Summary New `vad_node` in `saltybot_social`: - Subscribes to `/social/speech/audio_raw` (`UInt8MultiArray` PCM-16 LE) - Computes **RMS energy** (dBFS) and **zero-crossing rate** (ZCR) per chunk - Combined decision: `energy_db >= rms_threshold_db AND zcr_min <= zcr <= zcr_max` - `VadStateMachine` provides onset/offset hysteresis to prevent chattering - Publishes `/social/speech/is_speaking` (`Bool`) and `/social/speech/energy` (`Float32` linear RMS) ## Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `rms_threshold_db` | `-35.0` | Energy gate (dBFS) | | `zcr_min` | `0.01` | ZCR lower bound — rejects DC/rumble | | `zcr_max` | `0.40` | ZCR upper bound — rejects high-freq noise | | `onset_frames` | `2` | Consecutive active frames before `is_speaking=true` | | `offset_frames` | `8` | Consecutive silent frames before `is_speaking=false` | | `audio_topic` | `/social/speech/audio_raw` | Source PCM-16 topic | ## ZCR bands (16 kHz) | Signal | ZCR range | |--------|----------| | Silence / low-freq rumble | < 0.01 | | Voiced speech | 0.01–0.20 | | Unvoiced / sibilants | 0.20–0.40 | | High-freq noise | > 0.40 | ## Test plan - [x] 69/69 tests passing (`test_vad_node.py`) - [x] `pcm16_bytes_to_float32`: roundtrip, edge cases - [x] `rms_linear` / `rms_db`: silence, full-scale, sine RMS = A/√2 - [x] `zero_crossing_rate`: alternating=1.0, sine in range, higher freq → higher ZCR - [x] `VadStateMachine`: onset/offset hysteresis, reset - [x] Combined decision: speech passes, silence/noise rejected, boundary conditions - [x] Integration: 300 Hz voiced sine passes; silence rejected - [x] Node source structure, config, setup.py entry point Closes #242 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sl-webui added 1 commit 2026-03-02 12:26:23 -05:00
feat(social): energy+ZCR voice activity detection node (Issue #242)
Some checks failed
social-bot integration tests / Lint (flake8 + pep257) (push) Failing after 2s
social-bot integration tests / Core integration tests (mock sensors, no GPU) (push) Has been skipped
social-bot integration tests / Lint (flake8 + pep257) (pull_request) Failing after 2s
social-bot integration tests / Core integration tests (mock sensors, no GPU) (pull_request) Has been skipped
social-bot integration tests / Latency profiling (GPU, Orin) (push) Has been cancelled
social-bot integration tests / Latency profiling (GPU, Orin) (pull_request) Has been cancelled
4919dc0bc6
Add vad_node to saltybot_social: subscribes to /social/speech/audio_raw
(UInt8MultiArray PCM-16), computes RMS energy (dBFS) and zero-crossing
rate per chunk, applies onset/offset hysteresis (VadStateMachine), and
publishes /social/speech/is_speaking (Bool) and /social/speech/energy
(Float32 linear RMS). All thresholds configurable via ROS params:
rms_threshold_db=-35.0, zcr_min=0.01, zcr_max=0.40, onset_frames=2,
offset_frames=8, audio_topic. 69/69 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sl-jetson merged commit 82e836ec3f into main 2026-03-02 12:46:26 -05:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: seb/saltylab-firmware#247
No description provided.