feat: Phone voice command interface (Issue #553) #554

Merged
sl-jetson merged 1 commits from sl-android/issue-553-voice-command into main 2026-03-14 11:36:29 -04:00
Collaborator

Summary

  • phone/voice_commander.py: Termux-based voice command listener for SaltyBot
  • Continuous wake word detection loop — records 1.5s clips, transcribes with local Whisper, checks for 'Hey Salty' (exact + fuzzy token overlap fallback)
  • On wake word: plays TTS 'Yes?', records 3s command clip, transcribes, parses intent
  • Supported commands: go_forward, go_back, go_left, go_right, stop, follow_me, go_home, look_at_me
  • Publishes JSON {"command": ..., "raw": ..., "ts": ...} to /saltybot/voice/cmd via ROS2 (rclpy) or rosbridge WebSocket fallback
  • TTS confirmation via termux-tts-speak after each command
  • Audio capture via termux-microphone-record (16 kHz mono AAC)
  • Flags: --host, --port, --model (tiny/base/small), --threshold, --record-sec, --no-tts, --debug

Test plan

  • python3 -m py_compile phone/voice_commander.py — syntax clean
  • On Termux: pip install openai-whisper websocket-client then python3 phone/voice_commander.py --debug --no-tts
  • Say "Hey Salty" — verify wake detection in logs
  • Say "go forward" — verify /saltybot/voice/cmd receives {"command":"go_forward",...}
  • Say "follow me" — verify follow_me published with TTS confirmation
  • Unrecognised speech — verify unknown command and "Sorry, I didn't understand that" TTS
  • --model tiny for faster inference on low-end phones

🤖 Generated with Claude Code

## Summary - `phone/voice_commander.py`: Termux-based voice command listener for SaltyBot - Continuous wake word detection loop — records 1.5s clips, transcribes with local Whisper, checks for **'Hey Salty'** (exact + fuzzy token overlap fallback) - On wake word: plays TTS 'Yes?', records 3s command clip, transcribes, parses intent - Supported commands: `go_forward`, `go_back`, `go_left`, `go_right`, `stop`, `follow_me`, `go_home`, `look_at_me` - Publishes JSON `{"command": ..., "raw": ..., "ts": ...}` to `/saltybot/voice/cmd` via ROS2 (`rclpy`) or rosbridge WebSocket fallback - TTS confirmation via `termux-tts-speak` after each command - Audio capture via `termux-microphone-record` (16 kHz mono AAC) - Flags: `--host`, `--port`, `--model` (tiny/base/small), `--threshold`, `--record-sec`, `--no-tts`, `--debug` ## Test plan - [ ] `python3 -m py_compile phone/voice_commander.py` — syntax clean - [ ] On Termux: `pip install openai-whisper websocket-client` then `python3 phone/voice_commander.py --debug --no-tts` - [ ] Say "Hey Salty" — verify wake detection in logs - [ ] Say "go forward" — verify `/saltybot/voice/cmd` receives `{"command":"go_forward",...}` - [ ] Say "follow me" — verify `follow_me` published with TTS confirmation - [ ] Unrecognised speech — verify `unknown` command and "Sorry, I didn't understand that" TTS - [ ] `--model tiny` for faster inference on low-end phones 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sl-jetson added 1 commit 2026-03-14 10:27:05 -04:00
Add phone/voice_commander.py — Termux-based voice command listener for SaltyBot:
- Continuous wake word detection ('Hey Salty') via Whisper STT on short audio clips
- Command recording after wake word, transcribed with local Whisper (tiny/base/small)
- Parses go forward/back/left/right, stop, follow me, go home, look at me
- Publishes JSON to /saltybot/voice/cmd via ROS2 (rclpy) or rosbridge WebSocket
- TTS confirmation via termux-tts-speak; 'Yes?' prompt on wake word
- Fuzzy token-overlap fallback for wake word matching
- Flags: --host, --port, --model, --threshold, --record-sec, --no-tts, --debug

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sl-jetson merged commit 80e3b23aec into main 2026-03-14 11:36:29 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: seb/saltylab-firmware#554
No description provided.