saltylab-firmware

seb/saltylab-firmware

Fork 0

Commit Graph

Author	SHA1	Message	Date
sl-jetson	90c8b427fc	feat(social): multi-language support — Whisper LID + per-lang Piper TTS (Issue #167 ) Some checks failed social-bot integration tests / Lint (flake8 + pep257) (push) Failing after 2s Details social-bot integration tests / Core integration tests (mock sensors, no GPU) (push) Has been skipped Details social-bot integration tests / Lint (flake8 + pep257) (pull_request) Failing after 10s Details social-bot integration tests / Core integration tests (mock sensors, no GPU) (pull_request) Has been skipped Details social-bot integration tests / Latency profiling (GPU, Orin) (push) Has been cancelled Details social-bot integration tests / Latency profiling (GPU, Orin) (pull_request) Has been cancelled Details - Add SpeechTranscript.language (BCP-47), ConversationResponse.language fields - speech_pipeline_node: whisper_language param (""=auto-detect via Whisper LID); detected language published in every transcript - conversation_node: track per-speaker language; inject "[Please respond in X.]" hint for non-English speakers; propagate language to ConversationResponse. _LANG_NAMES: 24 BCP-47 codes -> English names. Also adds Issue #161 emotion context plumbing (co-located in same branch for clean merge) - tts_node: voice_map_json param (JSON BCP-47->ONNX path); lazy voice loading per language; playback queue now carries (text, lang) tuples for voice routing - speech_params.yaml, tts_params.yaml: new language params with docs - 47/47 tests pass (test_multilang.py) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-02 10:57:34 -05:00
sl-jetson	5043578934	feat(social): speech pipeline + LLM conversation + TTS + orchestrator (#81 #83 #85 #89 ) Issue #81 — Speech pipeline: - speech_pipeline_node.py: OpenWakeWord "hey_salty" → Silero VAD → faster-whisper STT (Orin GPU, <500ms wake-to-transcript) → ECAPA-TDNN speaker diarization - speech_utils.py: pcm16↔float32, EnergyVad, UtteranceSegmenter (pre-roll, max- duration), cosine speaker identification — all pure Python, no ROS2/GPU needed - Publishes /social/speech/transcript (SpeechTranscript) + /social/speech/vad_state Issue #83 — Conversation engine: - conversation_node.py: llama-cpp-python GGUF (Phi-3-mini Q4_K_M, 20 GPU layers), streaming token output, per-person sliding-window context (4K tokens), summary compression, SOUL.md system prompt, group mode - llm_context.py: PersonContext, ContextStore (JSON persistence), build_llama_prompt (ChatML format), context compression via LLM summarization - Publishes /social/conversation/response (ConversationResponse, partial + final) Issue #85 — Streaming TTS: - tts_node.py: Piper ONNX streaming synthesis, sentence-by-sentence first-chunk streaming (<200ms to first audio), sounddevice USB speaker playback, volume control - tts_utils.py: split_sentences, pcm16_to_wav_bytes, chunk_pcm, apply_volume, strip_ssml Issue #89 — Pipeline orchestrator: - orchestrator_node.py: IDLE→LISTENING→THINKING→SPEAKING state machine, GPU memory watchdog (throttle at <2GB free), rolling latency stats (p50/p95 per stage), VAD watchdog (alert if speech pipeline hangs), /social/orchestrator/state JSON pub - social_bot.launch.py: brings up all 4 nodes with TimerAction delays New messages: SpeechTranscript.msg, VadState.msg, ConversationResponse.msg Config YAMLs: speech_params, conversation_params, tts_params, orchestrator_params Tests: 58 tests (28 speech_utils + 30 llm_context/tts_utils), all passing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-02 08:23:19 -05:00

Author

SHA1

Message

Date

sl-jetson

90c8b427fc

feat(social): multi-language support — Whisper LID + per-lang Piper TTS (Issue #167 )

social-bot integration tests / Lint (flake8 + pep257) (push) Failing after 2s

Details

social-bot integration tests / Core integration tests (mock sensors, no GPU) (push) Has been skipped

Details

social-bot integration tests / Lint (flake8 + pep257) (pull_request) Failing after 10s

Details

social-bot integration tests / Core integration tests (mock sensors, no GPU) (pull_request) Has been skipped

Details

social-bot integration tests / Latency profiling (GPU, Orin) (push) Has been cancelled

Details

social-bot integration tests / Latency profiling (GPU, Orin) (pull_request) Has been cancelled

Details

- Add SpeechTranscript.language (BCP-47), ConversationResponse.language fields
- speech_pipeline_node: whisper_language param (""=auto-detect via Whisper LID);
  detected language published in every transcript
- conversation_node: track per-speaker language; inject "[Please respond in X.]"
  hint for non-English speakers; propagate language to ConversationResponse.
  _LANG_NAMES: 24 BCP-47 codes -> English names. Also adds Issue #161 emotion
  context plumbing (co-located in same branch for clean merge)
- tts_node: voice_map_json param (JSON BCP-47->ONNX path); lazy voice loading
  per language; playback queue now carries (text, lang) tuples for voice routing
- speech_params.yaml, tts_params.yaml: new language params with docs
- 47/47 tests pass (test_multilang.py)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-02 10:57:34 -05:00

sl-jetson

5043578934

feat(social): speech pipeline + LLM conversation + TTS + orchestrator (#81 #83 #85 #89 )

Issue #81 — Speech pipeline:
- speech_pipeline_node.py: OpenWakeWord "hey_salty" → Silero VAD → faster-whisper
  STT (Orin GPU, <500ms wake-to-transcript) → ECAPA-TDNN speaker diarization
- speech_utils.py: pcm16↔float32, EnergyVad, UtteranceSegmenter (pre-roll, max-
  duration), cosine speaker identification — all pure Python, no ROS2/GPU needed
- Publishes /social/speech/transcript (SpeechTranscript) + /social/speech/vad_state

Issue #83 — Conversation engine:
- conversation_node.py: llama-cpp-python GGUF (Phi-3-mini Q4_K_M, 20 GPU layers),
  streaming token output, per-person sliding-window context (4K tokens), summary
  compression, SOUL.md system prompt, group mode
- llm_context.py: PersonContext, ContextStore (JSON persistence), build_llama_prompt
  (ChatML format), context compression via LLM summarization
- Publishes /social/conversation/response (ConversationResponse, partial + final)

Issue #85 — Streaming TTS:
- tts_node.py: Piper ONNX streaming synthesis, sentence-by-sentence first-chunk
  streaming (<200ms to first audio), sounddevice USB speaker playback, volume control
- tts_utils.py: split_sentences, pcm16_to_wav_bytes, chunk_pcm, apply_volume, strip_ssml

Issue #89 — Pipeline orchestrator:
- orchestrator_node.py: IDLE→LISTENING→THINKING→SPEAKING state machine, GPU memory
  watchdog (throttle at <2GB free), rolling latency stats (p50/p95 per stage),
  VAD watchdog (alert if speech pipeline hangs), /social/orchestrator/state JSON pub
- social_bot.launch.py: brings up all 4 nodes with TimerAction delays

New messages: SpeechTranscript.msg, VadState.msg, ConversationResponse.msg
Config YAMLs: speech_params, conversation_params, tts_params, orchestrator_params
Tests: 58 tests (28 speech_utils + 30 llm_context/tts_utils), all passing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-02 08:23:19 -05:00

2 Commits